Skip to content

Commit

Permalink
Merge pull request #7 from LaurentRDC/frames
Browse files Browse the repository at this point in the history
Prototype of dataframes based on higher-kinded types
  • Loading branch information
LaurentRDC authored Jan 27, 2025
2 parents 1481862 + f75896a commit c130a47
Show file tree
Hide file tree
Showing 11 changed files with 934 additions and 17 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/haskell-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,5 @@ jobs:
run: |
cabal install doctest
cabal repl javelin --with-ghc=doctest
cabal repl javelin-io --with-ghc=doctest
cabal repl javelin-io --with-ghc=doctest
cabal repl javelin-frames --with-ghc=doctest
20 changes: 5 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,24 @@
# Haskell implementation of labeled one-dimensional arrays
# Haskell implementation of data structures for data science

Packages in this repository implement series, or labeled one-dimensional arrays, and associated functions.

Like [`Data.Map.Strict`](https://hackage.haskell.org/package/containers/docs/Data-Map-Strict.html), series support efficient:

* random access by key ( $\mathcal{O}\left( \log n \right)$ ) ;
* slice by key ( $\mathcal{O}\left( \log n \right)$ ).

Like [`Data.Vector.Vector`](https://hackage.haskell.org/package/vector/docs/Data-Vector.html), series support efficient:

* random access by integer index ( $\mathcal{O}\left( 1 \right)$ );
* slice by integer index ( $\mathcal{O}\left( 1 \right)$ );
* numerical operations.
Packages in this repository implement series and dataframes, data structures which are ubiquitous in data science.

## Tutorial and documentation

A tutorial and interface documentation for the most recent published version are [available here](https://hackage.haskell.org/package/javelin).
A tutorial and interface documentation for the most recent published version are [available here for series](https://hackage.haskell.org/package/javelin). A tutorial and interface documentation for dataframes is coming.

Locally, you can generate documentation for all packages using `haddock` like so:

```bash
cabal haddock javelin
cabal haddock javelin-io
cabal haddock javelin-frames
```

## Get involved!

Do not hesitate to make feature requests or report bugs via the [issue tracker](https://github.com/LaurentRDC/javelin/issues).

## Preliminary benchmarks
## Preliminary benchmarks for series

Looking up random integers:

Expand Down
5 changes: 5 additions & 0 deletions javelin-frames/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Revision history for javelin-frames

## 0.1.0.0 -- YYYY-mm-dd

* First version. Released on an unsuspecting world.
20 changes: 20 additions & 0 deletions javelin-frames/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Copyright (c) 2025 Laurent Rene de Cotret

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
53 changes: 53 additions & 0 deletions javelin-frames/benchmarks/Main.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE TypeFamilies #-}

import Control.DeepSeq ( NFData, rnf )
import Control.Exception ( evaluate )
import Criterion.Main ( bench, bgroup, nf, defaultMain )

import Data.Frame ( Column, Frameable, Indexable, Row, Frame )
import qualified Data.Frame as Frame
import qualified Data.Vector as Vector

import GHC.Generics ( Generic )


data Bench t
= MkBench { field1 :: Column t Int
, field2 :: Column t Int
, field3 :: Column t Int
, field4 :: Column t Int
, field5 :: Column t Int
, field6 :: Column t Int
}
deriving (Generic, Frameable)

instance NFData (Row Bench)
instance NFData (Frame Bench)

instance Indexable Bench where
type Key Bench = Int

index = field1


main :: IO ()
main = do
let rs = Vector.fromList [MkBench ix 0 0 0 0 0 | ix <- [0::Int .. 100_000]]
fr = Frame.fromRows rs
evaluate $ rnf rs
evaluate $ rnf fr
defaultMain
[ bgroup "Row-wise operations"
[ bench "fromRows" $ nf (Frame.fromRows) rs
, bench "toRows" $ nf (Frame.toRows) fr
, bench "toRows . fromRows" $ nf (Frame.fromRows . Frame.toRows) fr
, bench "fromRows . toRows" $ nf (Frame.toRows . Frame.fromRows) rs
]
, bgroup "Lookups"
[ bench "lookup" $ nf (Frame.lookup 100) fr
, bench "ilookup" $ nf (Frame.ilookup 99) fr
, bench "at" $ nf (`Frame.at` (100, field5)) fr
, bench "iat" $ nf (`Frame.iat` (99, field5)) fr
]
]
76 changes: 76 additions & 0 deletions javelin-frames/javelin-frames.cabal
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
cabal-version: 3.0
name: javelin-frames
version: 0.1.0.0
synopsis: Type-safe data frames based on higher-kinded types.
-- description:
license: MIT
license-file: LICENSE
author: Laurent P. René de Cotret
maintainer: [email protected]
category: Data, Data Structures, Data Science
build-type: Simple
extra-doc-files: CHANGELOG.md
tested-with: GHC ==9.12.1
|| ==9.10.1
|| ==9.8.4
|| ==9.6.4
|| ==9.4.8

description:

This package implements data frames, a data structure
where record types defined by the user can be transformed
into records of columns.

source-repository head
type: git
location: https://github.com/LaurentRDC/javelin

common common
default-language: GHC2021
ghc-options: -Wall
-Wcompat
-Widentities
-Wincomplete-uni-patterns
-Wincomplete-record-updates
-Wredundant-constraints
-fhide-source-paths
-Wpartial-fields

library
import: common
exposed-modules: Data.Frame
Data.Frame.Tutorial
build-depends: base >=4.15.0.0 && <4.22,
containers >=0.6 && <0.8,
vector >=0.12.3.0 && <0.14,
hs-source-dirs: src
default-language: GHC2021

test-suite javelin-frames-test
import: common
default-language: GHC2021
type: exitcode-stdio-1.0
hs-source-dirs: test
main-is: Main.hs
other-modules: Test.Data.Frame
build-depends: base >=4.15.0.0 && <4.22,
containers,
hedgehog,
javelin-frames,
tasty,
tasty-hedgehog,
tasty-hunit,
vector

benchmark bench-frames
import: common
type: exitcode-stdio-1.0
ghc-options: -rtsopts
hs-source-dirs: benchmarks
main-is: Main.hs
build-depends: base >=4.15.0.0 && <4.22,
criterion ^>=1.6,
deepseq,
javelin-frames,
vector
Loading

0 comments on commit c130a47

Please sign in to comment.