v0.4.0
github-actions
released this
19 Dec 08:34
·
60 commits
to refs/heads/main
since this release
What's Changed 🚀
💥 Breaking Changes
- feat: Default native runner @colin-ho (#3608)
- chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
- chore!: drop support for Python 3.8 @kevinzwang (#3592)
- chore!: remove pyarrow-based file reader @kevinzwang (#3587)
✨ Features
- feat: Default native runner @colin-ho (#3608)
- feat(swordfish): Progress Bar @colin-ho (#3571)
- feat(connect): df.show @universalmind303 (#3560)
- feat(connect): support
DdlParse
@andrewgazelka (#3580) - feat(swordfish): Optimize grouped aggregations @colin-ho (#3534)
- feat(swordfish): Enable left/right joins to build probe table on either side @colin-ho (#3548)
- feat: Add DataType inference from Python types @jaychia (#3555)
- feat(shuffles): Locality aware pre shuffle merge @colin-ho (#3505)
- feat: Implement count-distinct for sql @raunakab (#3553)
- feat(connect): add drop support @andrewgazelka (#3345)
- feat: support for basic subquery execution @kevinzwang (#3536)
- feat(connect): add
df.filter
@andrewgazelka (#3346) - feat: Make serialization code not unwrap and panic on failures @raunakab (#3546)
- feat: Unity Catalog writes using
daft.DataFrame.write_deltalake()
@anilmenon14 (#3522) - feat(connect): add parquet support @andrewgazelka (#3360)
- feat: Add iterators to more types @raunakab (#3539)
- feat(optimizer): Add scaffolding to create join graphs from logical plans @desmondcheongzx (#3501)
- feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing @raunakab (#3509)
- feat(list): add fixed-size list support for value_counts @andrewgazelka (#3521)
- feat(parquet): Limit parallel tasks in remote parquet reader @colin-ho (#3490)
- feat(parquet): Target parquet writes by size bytes instead of rows @colin-ho (#3457)
- feat: cross join @kevinzwang (#3437)
- [FEAT] connect: remove excessive warnings from spark connect @universalmind303 (#3499)
- [CHORE] connect, test:
df.withColumn
@andrewgazelka (#3359) - [FEAT]: expr simplifier @universalmind303 (#3393)
- [FEAT] shuffle testing @raunakab (#3492)
- [FEAT]: add
coalesce
to dataframe and SQL @universalmind303 (#3482) - [FEAT] add register-table helper to sql-catalog @chuanlei-coding (#2837)
- [FEAT] Respect resource request for projections in swordfish @colin-ho (#3460)
- [FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)
- [FEAT] connect: add modulus operator and withColumns support @andrewgazelka (#3351)
- [FEAT] connect: createDataFrame @andrewgazelka (#3363)
- [FEAT] Support parquet RLE decoding for booleans @desmondcheongzx (#3477)
- [FEAT] Cap parallelism on local parquet reader @colin-ho (#3310)
- [FEAT] connect: add binary operators @andrewgazelka (#3350)
- [FEAT] connect: support basic column operations @andrewgazelka (#3362)
- [FEAT] extend
build-commit
workflow to support different compile-archs @raunakab (#3459) - [FEAT] Add
count-distinct
aggregation @raunakab (#3455)
🐛 Bug Fixes
- fix(udf): udf call with empty table and batch size @kevinzwang (#3604)
- fix: use arrow's schema instead of spark's for local rel @universalmind303 (#3602)
- fix: guard concurrent extension datatype setting with a lock @jaychia (#3589)
- fix(parquet): Fix parquet reads of required fields nested within optional fields @desmondcheongzx (#3598)
- fix: boolean and/or expressions with null @kevinzwang (#3544)
- fix(run-cluster-workflow): Add null check when parsing metadata @raunakab (#3507)
- fix(tpcds): fix bugs in tpcds datagen script @universalmind303 (#3495)
- [BUG] Fix build commit workflow @raunakab (#3487)
- [BUG]: dont panic on count(distinct) @universalmind303 (#3481)
- [BUG] Block on parquet schema future in estimate_size_bytes @colin-ho (#3484)
🚀 Performance
- perf: filter null join key optimization rule @kevinzwang (#3583)
- perf: lazily import pyiceberg and unity catalog if available @jaychia (#3565)
♻️ Refactor
- refactor: allow InMemory to take in non python based entries @universalmind303 (#3554)
- refactor: create a rust based
PartitionSet
@universalmind303 (#3515) - refactor(swordfish): Generic broadcast state bridge @colin-ho (#3508)
📖 Documentation
- docs: update tpch benchmark link @ccmao1130 (#3542)
- docs: Enable Linting of docstrings @samster25 (#3506)
- [FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)
✅ Tests
- test(connect): add more tests for
createDataFrame
@andrewgazelka (#3607) - test: Add more size estimation tests from our s3 bucket @jaychia (#3514)
👷 CI
- ci: Always download logs @jaychia (#3588)
- ci: Add ability to array-ify args and run multiple jobs @raunakab (#3584)
- ci: Add "build" label type to accepted PR titles @raunakab (#3541)
- ci: add a tool to launch workloads on cluster @jaychia (#3516)
- ci(release-drafter): use conventional commit labels @andrewgazelka (#3503)
🔧 Maintenance
- chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
- chore: add warning for native runner @jaychia (#3613)
- chore!: drop support for Python 3.8 @kevinzwang (#3592)
- chore!: remove pyarrow-based file reader @kevinzwang (#3587)
- chore: Fix ordering in sql tests + pin docker images in read_sql tests @colin-ho (#3596)
- chore: move symbolic and boolean algebra code into new crate @kevinzwang (#3570)
- [CHORE] use conventional commits @andrewgazelka (#3493)
- [CHORE] connect, test:
df.withColumn
@andrewgazelka (#3359) - [CHORE] Add tests for parquet size estimations @jaychia (#3405)
- [CHORE] Move all python wrapping logic to separate module @raunakab (#3458)
Full Changelog: v0.3.15...v0.3.16