Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.2.33
Changes
✨ New Features
- [FEAT]: sql case/when @universalmind303 (#2591)
- [FEAT] Add comparison of timestamps with same timezone @Vince7778 (#2604)
- [FEAT] Add support for pyiceberg v0.7 @kevinzwang (#2594)
- [FEAT] Make the
end
argument for.list.slice()
optional @desmondcheongzx (#2593)
🚀 Performance Improvements
- [PERF] Add physical plan optimizer and optimization @Vince7778 (#2557)
👾 Bug Fixes
- [BUG]: remove simsimd dependency @universalmind303 (#2605)
- [BUG] Fix parquet reads when a top-level column's final row spans more than one data page @desmondcheongzx (#2586)
- [BUG]: accept "iterable[pa.Table]" for from_arrow @universalmind303 (#2583)
📖 Documentation
- [CHORE] Fix imports on jupyter notebook examples @kevinzwang (#2600)
🧰 Maintenance
- [CHORE] Fix imports on jupyter notebook examples @kevinzwang (#2600)
- [CHORE]: ignore ".zed" directory @universalmind303 (#2595)
v0.2.32
Changes
✨ New Features
- [FEAT] Fix resource accounting in PyRunner @jaychia (#2567)
- [FEAT] Add
.str.count_matches()
@Vince7778 (#2580) - [FEAT]: streaming json @universalmind303 (#2582)
- [FEAT]:
embedding.cosine_distance
function @universalmind303 (#2526) - [FEAT] Enable buffered iteration on plans @jaychia (#2566)
- [FEAT] Streaming CSV Reads @colin-ho (#2565)
- [FEAT] Support Reading Iceberg Merge-on-Read Position Deletes @kevinzwang (#2563)
- [FEAT]: daft sql @universalmind303 (#2558)
- [FEAT] Initializes daft-sql and defines the daft.sql(..) function. @rchowell (#2559)
- [FEAT] Add image mode casting @Vince7778 (#2562)
- [FEAT] Add tracing to local execution engine @samster25 (#2556)
- [FEAT] Create file when writing dataframe with no rows @kevinzwang (#2540)
- [FEAT] Add string tokenize expression @Vince7778 (#2503)
- [FEAT]: support optional rowgroups to
read_parquet
@universalmind303 (#2534) - [FEAT] Add hashjoin, sort, and hashagg ops to new executor @colin-ho (#2530)
👾 Bug Fixes
- [BUG] Fix
.str.length()
on Unicode strings @Vince7778 (#2579) - [BUG] Fix table filters with scalar mask @colin-ho (#2542)
- [BUG] Enable sorting bool columns @colin-ho (#2529)
📖 Documentation
- [FEAT] Add
.str.count_matches()
@Vince7778 (#2580) - [DOCS]: small update to "scaling-up" @universalmind303 (#2577)
- [DOCS] quickstart-revision @avriiil (#2124)
- [DOCS] Update Ray Dataset link @kevinzwang (#2564)
- [FEAT] Add image mode casting @Vince7778 (#2562)
- [DOCS] Fix notebook CI @jaychia (#2544)
- [FEAT] Add string tokenize expression @Vince7778 (#2503)
- [DOCS] Do not generate PDFs @jaychia (#2539)
- [DOCS] Build PDF and htmlzip formats of docs for offline consumption @jaychia (#2538)
🧰 Maintenance
- [CHORE] Remove daft-execution @colin-ho (#2553)
- [CHORE] Enable all features for rust-analyzer @Vince7778 (#2560)
- [CHORE] Add TPC-H questions 11-22 to benchmarks (currently skipped) @kevinzwang (#2299)
- [CHORE]: move minhash to daft-functions @universalmind303 (#2518)
⬆️ Dependencies
- Bump async-compression from 0.4.10 to 0.4.12 @dependabot (#2548)
v0.2.31
Changes
👾 Bug Fixes
- [BUG] Fix bug with map_groups UDFs that return more than 1 output row for empty partitions @jaychia (#2532)
- [BUG] Use shared thread pool for multiple running instances of df on pyrunner @jaychia (#2502)
- [BUG] Fix bug with multi-partition
any_value
@Vince7778 (#2531) - [BUG] Allow for Parquet reading from files with differing schemas @jaychia (#2514)
- [BUG] With_new_children not implemented for sample @colin-ho (#2528)
🧰 Maintenance
- [CHORE] add pytest benchmarking for local testing of execution engine @samster25 (#2523)
v0.2.30
Changes
✨ New Features
- [FEAT] Decouple pipeline building and running from new executor @colin-ho (#2522)
- [FEAT] Add concat to new execution model + buffered intermediate ops @colin-ho (#2519)
- [FEAT] Math Ops for FixedSizeList / FixedShapeTensor / Embedding Type @samster25 (#2507)
- [FEAT] adding transform functionality @otacilio-psf (#2498)
- [FEAT] List chunk expression @desmondcheongzx (#2491)
- [FEAT] Refactors and agg improvements for new local execution model @colin-ho (#2497)
- [FEAT]: dyn function registry @universalmind303 (#2466)
- [FEAT] New Local Execution Model @colin-ho (#2437)
- [FEAT] Implement hashing and groupby on lists @Vince7778 (#2464)
- [FEAT] Add upload functionality to binary columns @jaychia (#2461)
- [FEAT] List slice expression @desmondcheongzx (#2479)
- [FEAT] add bit shift functions @murex971 (#2453)
- [FEAT] Implements trigonometry expressions: arctanh arccosh arcsinh @fedemagnani (#2476)
- [FEAT] Microsoft Fabric support in AzureConfig @kevinzwang (#2465)
🚀 Performance Improvements
- [PERF]: dont read parquet metadata multiple times @universalmind303 (#2358)
- [PERF] Local Execution Plan @samster25 (#2489)
- [PERF] Optimize string normalization @Vince7778 (#2474)
👾 Bug Fixes
- [BUG] merge conflict for python catalog scan task @samster25 (#2517)
- [BUG] Add retries for write timeout errors @mauriceweber (#2508)
- [BUG] Use Daft s3 credentials chain for deltalake reads @jaychia (#2486)
- [BUG] Support delta-rs version >0.17.4 in deltalake writes @jaychia (#2488)
- [BUG] Fix anti-join on different column names @Vince7778 (#2477)
📖 Documentation
- [FEAT] adding transform functionality @otacilio-psf (#2498)
- [DOCS] Tutorial: FOTW data access @avriiil (#2384)
- [FEAT] List chunk expression @desmondcheongzx (#2491)
- [DOCS] Enhance Dataframe / Expressions examples @sunaysanghani (#2360)
- [FEAT] add bit shift functions @murex971 (#2453)
- [FEAT] Implements trigonometry expressions: arctanh arccosh arcsinh @fedemagnani (#2476)
- [DOCS] Adds RunLLM widget @vsreekanti (#2462)
- [DOCS] Fix broken example URL @kevinzwang (#2467)
- [FEAT] Microsoft Fabric support in AzureConfig @kevinzwang (#2465)
- [DOCS] Remove unsupported map get doc example @kevinzwang (#2452)
🧰 Maintenance
- [CHORE] remove tokio-stream as dep @samster25 (#2521)
- [CHORE] drop unused deps and add machete CI check @samster25 (#2520)
- [CHORE] fix incremental builds with vscode rust analyzer @samster25 (#2515)
- [CHORE] Disable Python as default feature and have maturin enable it by default @samster25 (#2516)
- [CHORE]: remove daft-core from daft-io @universalmind303 (#2513)
- [CHORE]: move uri functions to new "daft-functions" crate @universalmind303 (#2501)
- [CHORE] Add better typing for class UDFs @jaychia (#2388)
- [CHORE] Add a helpful message in build-artifact-s3 workflow @jaychia (#2480)
v0.2.29
Changes
✨ New Features
- [FEAT] String normalize expression @Vince7778 (#2450)
- [FEAT] Add a IOConfig.http with initial option for user_agent @jaychia (#2449)
- [FEAT] Add Schema.to_pyarrow_schema() @jaychia (#2447)
- [FEAT] thread through transport errors @samster25 (#2446)
- [FEAT] Add MinHash expression @Vince7778 (#2431)
- [FEAT]: write lance @universalmind303 (#2421)
- [FEAT] Add struct get syntactic sugar @kevinzwang (#2367)
- [FEAT] Bitwise 'AND' 'OR' 'XOR' Operations @mrutunjay-kinagi (#2365)
- [FEAT] Add bearer token authentication for Azure @kevinzwang (#2436)
- [FEAT] Add ability to specify snapshot_id for iceberg read @jaychia (#2426)
- [FEAT]: hash expr @universalmind303 (#2398)
- [FEAT] Fixed Size Binary Type v2 @Vince7778 (#2403)
- [FEAT] Automatically use Ray Runner if Ray is initialized @jaychia (#2282)
- [FEAT] print more info when hitting todo in stage planner @samster25 (#2416)
- [FEAT] Add casting from lists to embeddings @Vince7778 (#2396)
👾 Bug Fixes
- [BUG] Use estimated in-memory size for scan task merging and resource requests @kevinzwang (#2448)
- [BUG] Add retries to pyarrow write_dataset call @kevinzwang (#2445)
- [BUG] Fix time type inference @colin-ho (#2441)
- [BUG] Enable display of time64 with seconds unit @jaychia (#2439)
- [BUG] Raise error when Ray Data tensor cannot be pickled and disable compliant nested types @kevinzwang (#2428)
- [BUG] Remove debug print @Vince7778 (#2419)
- [BUG] [New Executor] Drain channel in limit sink @colin-ho (#2401)
- [BUG] Slice array if necessary when casting from list to fixed size list @colin-ho (#2415)
📖 Documentation
- [FEAT] String normalize expression @Vince7778 (#2450)
- [FEAT] Add Schema.to_pyarrow_schema() @jaychia (#2447)
- [FEAT] Add MinHash expression @Vince7778 (#2431)
- [DOCS] Improve groupby and agg docs @kevinzwang (#2438)
- [DOCS] Update unity-catalog docs to include installation @jaychia (#2440)
- [FEAT] Automatically use Ray Runner if Ray is initialized @jaychia (#2282)
v0.2.28
Changes
✨ New Features
- [FEAT] Add manual auth for GCS and Iceberg GCS auth support @kevinzwang (#2393)
- [FEAT] Replace schema_hints with schema and infer_schema for read_json @GuyPozner (#2357)
- [FEAT] [New Executor] [3/N] Full execution model prototype. @clarkzinzow (#2347)
- [FEAT] date and timestamp parsers @murex971 (#2353)
- [FEAT] Implement Anti and Semi Join @samster25 (#2379)
- [FEAT] Implement arctan2 expression @Vince7778 (#2389)
- [FEAT] Add Unity Catalog support @jaychia (#2377)
- [FEAT] fill_nan and not_nan expressions @colin-ho (#2313)
- [FEAT] Add more context when UDFs fail @jaychia (#2325)
- [FEAT] [New Executor] [2/N] daft-execution crate + proof-of-concept compute ops and partition reference + metadata model for new executor. @clarkzinzow (#2340)
👾 Bug Fixes
- [BUG]
with_column
with existing column name should not reorder columns @colin-ho (#2381) - [BUG] Fix IO integration tests @jaychia (#2390)
- [BUG]: ide completions for expr namespaces @universalmind303 (#2374)
- [BUG] enable empty stats for deltalake @samster25 (#2376)
- [BUG] Allow variable columns in CSV @colin-ho (#2326)
📖 Documentation
- [FEAT] date and timestamp parsers @murex971 (#2353)
- [FEAT] Implement arctan2 expression @Vince7778 (#2389)
- [FEAT] Add Unity Catalog support @jaychia (#2377)
- [FEAT] fill_nan and not_nan expressions @colin-ho (#2313)
- [EXPRESSIONS] Add missing doc gen for Expression.float.is_nan @tlm365 (#2378)
- [DOCS] add Delta writer functionality @avriiil (#2372)
- [EXPRESSIONS] Implement Expression.float.is_inf @tlm365 (#2371)
- [DOCS] Add tutorial for Data + AI Summit 2024 @jaychia (#2368)
- [DOCS] improve SQL docs @avriiil (#2361)
- [DOCS] the project currently uses ray which is not compatible with python 3.12 @prabodh1194 (#2354)
- [CHORE]: make parquet metadata (de)serializable @universalmind303 (#2346)
- [CHORE]: Arrow2 migrate @universalmind303 (#2341)
🧰 Maintenance
- [CHORE] Run doctests in CI @colin-ho (#2362)
- [CHORE] Bump chrono to 0.4.38 @colin-ho (#2352)
- [CHORE]: make parquet metadata (de)serializable @universalmind303 (#2346)
- [CHORE]: Arrow2 migrate @universalmind303 (#2341)
⬆️ Dependencies
- Bump num-traits from 0.2.18 to 0.2.19 @dependabot (#2285)
v0.2.27
Changes
✨ New Features
- [FEAT] Introduce terminal wrap around when explaining plans @samster25 (#2342)
- [FEAT] [New Executor] [1/N] Move physical plan scheduler to new crate, misc. refactorings + drive-bys for new executor @clarkzinzow (#2339)
👾 Bug Fixes
- [BUG] Azure and Iceberg read and write fixes @kevinzwang (#2349)
- [BUG] Allow nulls in partition column @colin-ho (#2344)
- [BUG] Allow any file extension in Azure directory listing test @kevinzwang (#2345)
- [BUG] Add Retry on Streaming Errors when collecting stream into bytes @samster25 (#2338)
v0.2.26
Changes
✨ New Features
- [FEAT] Public Delta Lake writer @kevinzwang (#2329)
- [FEAT] Implement str.substr Expression @danila-b (#2269)
- [FEAT] Add additional Azure authentication methods @kevinzwang (#2333)
- [FEAT] Custom S3 Credentials Provider @kevinzwang (#2233)
- [FEAT] Expression between @GuyPozner (#2301)
- [FEAT] Allow for printing of plans to a file @jaychia (#2320)
👾 Bug Fixes
- [BUG] Use os.path.join in read_hudi only for local fs @colin-ho (#2336)
- [BUG] Translate mssql to tsql in read_sql scan @colin-ho (#2330)
- [BUG]Fix missing columns after join @siddharth-gulia (#2321)
📖 Documentation
- [FEAT] Public Delta Lake writer @kevinzwang (#2329)
- [FEAT] Implement str.substr Expression @danila-b (#2269)
- [FEAT] Custom S3 Credentials Provider @kevinzwang (#2233)
- [FEAT] Expression between @GuyPozner (#2301)
- [EXPRESSIONS] Add log with custom base @jhasm (#2324)
🧰 Maintenance
⬆️ Dependencies
- Bump rayon from 1.8.0 to 1.10.0 @dependabot (#2307)
v0.2.25
Changes
✨ New Features
- [FEAT] Add function to refresh logger state for rust @samster25 (#2323)
- [FEAT] implement timeout for parquet reader @samster25 (#2322)
- [FEAT] Use detached named actor for RayRunner @jaychia (#2296)
- [FEAT] like and ilike functions @murex971 (#2283)
- [FEAT] Delta Lake Writer (non-public API) @kevinzwang (#2304)
- [FEAT] extract minute second and time component @murex971 (#2234)
- [FEAT]: tilde expansion @universalmind303 (#2277)
- [FEAT] Map Getter @colin-ho (#2255)
- [FEAT] Test for partition evolution @Fokko (#2084)
- [FEAT] Handle Hudi empty timeline @xushiyan (#2268)
🚀 Performance Improvements
- [PERF]: local json reader @universalmind303 (#2264)
👾 Bug Fixes
- [BUG] Fix multi-output tasks in RayRunner @jaychia (#2291)
- [BUG] Fix red line in Jupyter notebook @kevinzwang (#2267)
📖 Documentation
- [FEAT] like and ilike functions @murex971 (#2283)
- [DOCS] Add iceberg summit tutorial notebook @jaychia (#2281)
- [FEAT] extract minute second and time component @murex971 (#2234)
- [FEAT] Map Getter @colin-ho (#2255)
- [DOCS] Improve docs for external types @kevinzwang (#2274)
- [DOCS] Add quick docs for struct datatype @jaychia (#2256)
🧰 Maintenance
- [CHORE]: Don't use record fields @Fokko (#2306)
- [CHORE] Pin requests to fix docker-py CI issue @jaychia (#2289)
- [CHORE] Clean up string kernel code @kevinzwang (#2276)
- [CHORE] Empty PR to retrigger coverage @samster25 (#2262)
- [CHORE] turn on cov again @samster25 (#2259)
- [CHORE] Enable CI checks for AQE in CI and refactor CI job organization @samster25 (#2258)
⬆️ Dependencies
- Bump async-compression from 0.4.7 to 0.4.10 @dependabot (#2270)
- Bump tokio-util from 0.7.9 to 0.7.11 @dependabot (#2271)
v0.2.24
Changes
✨ New Features
- [FEAT] Allow returning of pyarrow arrays from UDFs @jaychia (#2252)
- [FEAT] Add left, right, and outer joins @kevinzwang (#2166)
- [FEAT] Add rpad and lpad expressions @murex971 (#2157)
- [FEAT] AWS Profile override in S3Config @samster25 (#2243)
- [FEAT] Add unpivot @kevinzwang (#2204)
- [FEAT] Add string repeat functionality @murex971 (#2198)
- [FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)
- [FEAT] pivot @colin-ho (#2183)
🚀 Performance Improvements
- [PERF] Adaptive Query Execution @samster25 (#2176)
- [PERF]: swap out json_deserializer for simd_json @universalmind303 (#2228)
- [PERF] Evaluate only true/false side of if_else if predicate is boolean @colin-ho (#2222)
- [PERF] enable metadata preservation across materialization points @samster25 (#2216)
👾 Bug Fixes
- [BUG] Fix tab completion on expression namespaced accessors @jaychia (#2251)
- [BUG] route abfss to AzureBlob @samster25 (#2244)
📖 Documentation
- [CHORE] Skip demo notebook @jaychia (#2248)
- [FEAT] Add rpad and lpad expressions @murex971 (#2157)
- [DOCS] Add user guide for read_sql @colin-ho (#2226)
- [FEAT] Add unpivot @kevinzwang (#2204)
- [DOCS] Add
read_hudi
in the api docs @xushiyan (#2225) - [FEAT] Add string repeat functionality @murex971 (#2198)
- [DOCS] LinkedIn Big Data meetup tutorial @jaychia (#2223)
- [FEAT] Approximate quantile aggregation (pulled into main) @jaychia (#2179)
- [DOCS] Add read_lance docs @jaychia (#2218)
- [FEAT] pivot @colin-ho (#2183)
🧰 Maintenance
- [CHORE] Drop Python 3.7 @samster25 (#2250)
- [CHORE] Improve timestamp repr @colin-ho (#2245)
- [CHORE] Allow multiple group_bys for pivot @colin-ho (#2242)
- [CHORE] Skip demo notebook @jaychia (#2248)
- [CHORE] Return &str for expression name @colin-ho (#2224)
- [CHORE] Mount provision.py for iceberg integration tests @jaychia (#2232)
- [CHORE]: remove trait aliases @universalmind303 (#2229)
⬆️ Dependencies
- Bump serde from 1.0.198 to 1.0.200 @dependabot (#2239)
- Bump csv-async from 1.2.6 to 1.3.0 @dependabot (#2238)