14 Jun 22:34

jaychia

f74f0cb

v0.1.4

Daft 0.1.4 Release Notes

The Daft 0.1.4 release features our Image type columns!

New Features

Image Types

Our first Daft Image types have landed!

You can now construct an Image column with .image.decode() on a Binary column.

See PRs:

[Images] [1/N] Logical type for variable-shaped and fixed-shaped images. #955
[Images] [3/N] Add image decoding for uint8 images. #981
Image Resize for ImageType #967
[Images] [2/N] Add scaffolding for image decoding and other ops. #965

Documentation

Fix modin typo; add partial scale numbers; highlight highlights #986
Add Page on benchmarking #980
Fix link for broken link checker #972
Clean up DataType userguide/API docs #966
Add more complex datatypes to docs #961
Datatype docs #894

Bug Fixes

[Scheduler] Fix join performance bug #985
Table Slice and IntoPartitions Fix #962
size_bytes fix: Guard against calculating variance of one item #957

Build Changes

Bump hypothesis from 6.75.8 to 6.75.9 #979
Bump orjson from 3.8.14 to 3.9.0 #978
Bump hypothesis from 6.75.6 to 6.75.8 #976
Add s3fs to dev requirements #975
Bump python version to 3.9 for profiling and pin Dask version for python 3.8 #973
Bump log from 0.4.17 to 0.4.18 #971
Bump dask from 2023.5.0 to 2023.5.1 #970
Bump pandas from 2.0.1 to 2.0.2 #969
Bump hypothesis from 6.75.5 to 6.75.6 #968
Bump orjson from 3.8.13 to 3.8.14 #964
Bump hypothesis from 6.75.3 to 6.75.5 #963
finer feature flags for arrow2 for faster compile #960
Bump pytest-cov from 4.0.0 to 4.1.0 #959
Bump orjson from 3.8.12 to 3.8.13 #958

Assets 2

14 Jun 22:32

jaychia

v0.1.3

e0140ec

v0.1.3

Daft 0.1.3 Release Notes

The Daft 0.1.3 release features fixes for a few performance regressions.

Enhancements

Very basic s3 parquet microbenchmark #954

Bug Fixes

[I/O] Change back to random access read for Parquet. #953
[CI] Fix flaky Ray Datasets integration test. #952
[Ray Runner] Unfixing batch size for task awaiting #951
Testing object related performance fixes #949

Build Changes

[ci] [daft publish] pin urllib to < 2 for conda #950

Assets 2

14 Jun 22:26

jaychia

v0.1.2

05cd7a3

v0.1.2

Daft 0.1.2 Release Notes

The Daft 0.1.2 release features performance improvements, bugfixes and some of our first Daft logical types!

New Features

Extension Types for Ray Runner and Embedding Logical Type

Adds our first “Logical Type”: Embeddings!

An Embedding is a “Logical Type” that encompasses a Fixed Size List. It is common in applications for Machine Learning and AI.

See: #929

Enhancements

Use PyArrow filesystem for tabular file reads #939
[I/O] Port to pyarrow filesystems by default. #942
Memoize ray.get for batch metadata lookup #937
[I/O] Expose user-provided fsspec filesystem arg in read APIs. #931
Introduce Logical Arrays and SeriesLike Trait #920
[Extension Types] Add support for cross-lang extension types. #899

Bug Fixes

fix concats for extension array for old versions of pyarrow #944

Build Changes

[ci] enable pyrunner for 310 #946
Add Pyarrow 6.0 in matrix for CI testing #945
Update requirement of tabulate to >=0.9.0 #940
unpin numpy for 3.7 to get dependabot to stop complaining #938
Bump slackapi/slack-github-action from 1.23.0 to 1.24.0 #936
Bump hypothesis from 6.75.2 to 6.75.3 #928
Bump dask from 2023.4.1 to 2023.5.0 #927
Bump serde from 1.0.162 to 1.0.163 #921

Documentation

Add comment to explain future annotations isort rule in dataframe.py #947
[Embedding tutorial] Suggest running on GPU cluster #932
Embeddings tutorial #930

Assets 2

14 Jun 22:24

jaychia

v0.1.1

87053a2

v0.1.1

Daft 0.1.1 Release Notes

The Daft 0.1.1 release provides bugfixes and stabilization fixes.

Enhancements

Enable and test writing temporal types #897
Fix universal expressions on temporals #895
[Arrow Types] Add automatic Python object fallback for unsupported Arrow types. #886

Bug Fixes

Fix fsspec multithreading clobbering issue #898
Fix temporal unit tests for older versions of pyarrow #919
Fix colon URL downloads and default to strict mode for .url.download() #896
[CI] Fix flaky Datasets integration test. #917
Import daft in local benchmarking unit tests #887
Fix imports in microbenchmarks #885

Build Changes

enable python 3.10 unit tests #915
Update pyo3 to 0.18.3 #914
Bump serde from 1.0.160 to 1.0.162 #912
Bump arrow2 from 0.17.0 to 0.17.1 #910
Bump actions/upload-artifact from 2 to 3 #909
Bump actions/download-artifact from 2 to 3 #907
Bump actions/setup-python from 3 to 4 #906
Enable dependabot for pip, cargo and github-actions #904
pinned deps for requirements-dev.txt #903

Documentation

Fix README.rst quickstart #925
Fix typo: CSV -> Parquet #893
Add initial technical architecture docs #890
Fix 10-min tutorial link to colab #884

Assets 2

14 Jun 22:22

jaychia

v0.1.0

36a1db6

v0.1.0

Daft 0.1.0 Release Notes

Welcome to the first “minor” version release of Daft!

We hope everyone has had a great time using our 0.0.* releases, but buckle up and grab a drink while you read these release notes, because we built so much over the past month and this new release is BIG!

A big shoutout to the contributors who made this all possible - with 21,716 added and 16,090 deleted lines of code!

@xcharleslin @clarkzinzow @jeevb @samster25 @jaychia @FelixKleineBoesing

Main Highlights

We rebuilt all of our core execution code in Rust - giving us a 2x speedup across the board for many of our benchmarks!
Our type system just levelled up! We have a much more sophisticated type system written in Rust that can handle nested types, parametrized types and type promotion semantics.
The UDF API is much cleaner now with the introduction of the Daft Series object!
Python object columns are now much more featureful with support for casting and magic methods

The full list of changes is much too long to present in this release notes, but here it is anyways.

Enhancements

Rust Execution Backend

Our execution code was refactored into Rust!

Previously Daft relied on a mix of NumPy, Polars, Pandas and PyArrow for executing logic. This was problematic for a few reasons:

Difficult to perform performance optimizations
Difficult to understand and manage memory allocation
Dependency hell!
Flaky null-handling and broadcasting semantics depending on the library we used

As of 0.1.0, Daft is now statically linked to the wonderful Arrow2 Rust library which we use for executing all our kernels.

This has several implications:

Daft now has a Python-binded Rust execution layer, mainly comprising of the Table, Series and Expression abstractions.
Daft is much faster (up to 2x in many cases, especially for our default multithreaded Python runner!) as GIL contention is no longer a bottleneck
Daft’s Python dependencies have been greatly reduced, and is much more lightweight!

On the user-facing API, most of these changes are completely transparent - i.e. you just got a massive speedup for free!

New Features

Enhanced Type System

Our type system just levelled up!

Nested types were added #802
Types are now much more granular (e.g. int64 vs uint32 vs int8…)
Types are automatically promoted when necessary during certain operations (e.g. adding a Null array and a Int64 array results in Int64!)

As a result, we have much cleaner support for Null array handling since the Null type can be correctly type-promoted with our new supertype semantics.

Deprecations

As our first minor release, several APIs have changed substantially that you should be aware of. Moving forward, Daft APIs will maintain much stricter backward compatibility semantics.

UDFs

UDFs are much cleaner in 0.1.0!

UDFs now no longer require up-front declaration of which arguments have to be Expressions, and what input types they are passed in as (list, numpy, arrow etc). Instead:

Inputs are always passed in as daft.series.Series objects and users can now easily convert this to the format they care about using Series.to_pylist(), Series.to_numpy() etc.
Which inputs are going to be daft.series.Series vs Python objects is inferred at runtime by checking which arguments a user passes in are Expressions.

For more information, consult: UDF User Guide

Typing

Our old typing APIs have changed - the definitive typing API is now found at daft.DataType.

If you are declaring types (for instance as return types for UDFs), you should now use the DataType.* constructor methods!

Input/Output APIs

Creation of DataFrames has been promoted to module-level functions!

Before:

from daft import DataFrame

df = DataFrame.read_csv(...)

After:

import daft

df = daft.read_csv(...)

This is a big improvement in useability (moving forward, Daft will try to make it as easy as possible to use us by just importing the top-level daft module).

For more information, please see: API Documentation for Input/Output.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daft 0.1.4 Release Notes

New Features

Image Types

Documentation

Bug Fixes

Build Changes

Daft 0.1.3 Release Notes

Enhancements

Bug Fixes

Build Changes

Daft 0.1.2 Release Notes

New Features

Extension Types for Ray Runner and Embedding Logical Type

Enhancements

Bug Fixes

Build Changes

Documentation

Daft 0.1.1 Release Notes

Enhancements

Bug Fixes

Build Changes

Documentation

Daft 0.1.0 Release Notes

Main Highlights

Enhancements

Rust Execution Backend

New Features

Enhanced Type System

Deprecations

UDFs

Typing

Input/Output APIs

Releases: Eventual-Inc/Daft

v0.1.4

Daft 0.1.4 Release Notes

New Features

Image Types

Documentation

Bug Fixes

Build Changes

v0.1.3

Daft 0.1.3 Release Notes

Enhancements

Bug Fixes

Build Changes

v0.1.2

Daft 0.1.2 Release Notes

New Features

Extension Types for Ray Runner and Embedding Logical Type

Enhancements

Bug Fixes

Build Changes

Documentation

v0.1.1

Daft 0.1.1 Release Notes

Enhancements

Bug Fixes

Build Changes

Documentation

v0.1.0

Daft 0.1.0 Release Notes

Main Highlights

Enhancements

Rust Execution Backend

New Features

Enhanced Type System

Deprecations

UDFs

Typing

Input/Output APIs