Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.1.14
Changes
✨ New Features
- [FEAT] add flag to use multithreaded io for parquet_read_table @samster25 (#1298)
- [FEAT] Add Retry Mode, connection timeout, and read timeout to S3Config @samster25 (#1293)
- [FEAT] [New Query Planner] Add optimization framework and
PushDownFilter
rule. @clarkzinzow (#1284)
👾 Bug Fixes
- [BUG] Fix semantic merge conflict @xcharleslin (#1286)
🧰 Maintenance
- [CHORE] Move schema construction under LogicalPlan construction @xcharleslin (#1290)
- [CHORE] Implement growables for array types @jaychia (#1287)
- [CHORE] Unify indexmap versions and bump to 2.0.0 @xcharleslin (#1291)
- [CHORE] Refactor Series downcast and LogicalArrayImpl @jaychia (#1289)
- [CHORE] Pass in file size and num rows to Rust query planner @xcharleslin (#1282)
v0.1.13
Changes
✨ New Features
- [FEAT] Add Flag to_arrow to convert large string arrays @samster25 (#1283)
👾 Bug Fixes
- [BUG] try release profile rather than dev-bench for daft profiling @samster25 (#1280)
🧰 Maintenance
- [CHORE] reduce severity of region reroute logs to debug @samster25 (#1279)
v0.1.12
Changes
✨ New Features
- [FEAT] [New Query Planner] All functional tests pass + add to CI. @clarkzinzow (#1274)
- [FEAT] [New Query Planner] Add support for `df.count_rows(). @clarkzinzow (#1273)
- [FEAT] native google cloud reader @samster25 (#1271)
- [FEAT] [New Query Planner] Groupby support, aggregation fixes, support for remaining aggregation ops @clarkzinzow (#1272)
- [FEAT] [New Query Planner] Support for Ray runner in new query planner. @clarkzinzow (#1265)
- [FEAT] Add Schema.from_pyarrow @jaychia (#1262)
- [FEAT] [New Query Planner] Add support for joins. @clarkzinzow (#1260)
- [FEAT] [New Query Planner] Add support for Explode. @clarkzinzow (#1258)
👾 Bug Fixes
- [BUG] Use manylinux_2_24 for aarch64 linux to be able to publish manylinux2014 @samster25 (#1275)
📖 Documentation
- [FEAT] [New Query Planner] Support for Ray runner in new query planner. @clarkzinzow (#1265)
🧰 Maintenance
- [CHORE] Refactor arrays to share a FromArrow constructor trait @jaychia (#1276)
- [CHORE] Bump rust nightly channel date @jaychia (#1255)
⬆️ Dependencies
4 changes
- Bump opencv-python from 4.8.0.74 to 4.8.0.76 @dependabot (#1267)
- Bump orjson from 3.9.2 to 3.9.4 @dependabot (#1268)
- Bump image from 0.24.6 to 0.24.7 @dependabot (#1269)
- Bump isbang/compose-action from 1.5.0 to 1.5.1 @dependabot (#1270)
v0.1.11
Changes
✨ New Features
- [FEAT] [New Query Plan] Add support for Projection and Coalesce, enable many tests @clarkzinzow (#1256)
- [FEAT] [New Query Planner] Add support for Concat. @clarkzinzow (#1254)
- [FEAT] [New Query Planner] Add support for tabular writes. @clarkzinzow (#1252)
- [FEAT] Multi-partition aggregate; Coalesce @xcharleslin (#1249)
- [FEAT] [New Query Planner] Add support for Sort, Repartition, and Distinct in new query planner. @clarkzinzow (#1248)
- [FEAT] Add Azure Support for Native Downloader @samster25 (#1250)
- [FEAT] Locally unique semantic IDs for Expressions @xcharleslin (#1243)
- [FEAT] Read parquet tables with int96 coercion option @jaychia (#1231)
- [FEAT] [New Query Plan] Add support for CSV scans, JSON scans, in-memory scans and caching materialized results. @clarkzinzow (#1246)
- [FEAT] Native Downloader add Retry Config parameters @samster25 (#1244)
- [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
- [FEAT] [New Query Planner] Logical --> physical translation, physical plan execution. @clarkzinzow (#1232)
- [FEAT] native parquet correctness checks @samster25 (#1225)
- [FEAT] add session token as input to io config @samster25 (#1224)
🚀 Performance Improvements
- [PERF] Native Parquet Bulk Reader @samster25 (#1233)
👾 Bug Fixes
- [BUG] drop native-tls (openssl) for azure which was a default feature @samster25 (#1251)
- [BUG] Fix decimal byte arrays @jaychia (#1247)
- [BUG] correct type when printing incorrect row count @samster25 (#1226)
- [BUG] try manylinux 2 28 @samster25 (#1214)
- [BUG] downgrade ray to 2.6 @samster25 (#1212)
- [BUG] add explict target for aarch64 linux @samster25 (#1209)
- [BUG] Fix incorrect sign bug for small decimals @xcharleslin (#1204)
- [BUG] Set SSL paths on linux @samster25 (#1203)
📖 Documentation
- [DOCS] Fix daft.read_parquet link @jaychia (#1228)
- [DOCS][CHORE] Add docs for IOConfig and S3Config @jaychia (#1227)
🧰 Maintenance
- [CHORE] Update test to only use store_schema kwarg for pa>=11 @jaychia (#1253)
- [FEAT] (Single partition only) DataFrame.sum() via Rust planner @xcharleslin (#1230)
- [CHORE] [New Query Planner] Introduce
LogicalPlanBuilder
andQueryPlanner
interfaces to hide query planner implementations. @clarkzinzow (#1245) - [CHORE] LogicalPlan: Add display improvements, and Filter @xcharleslin (#1221)
- [CHORE] Add unit tests for int96 timestamps @jaychia (#1229)
- [DOCS][CHORE] Add docs for IOConfig and S3Config @jaychia (#1227)
- [CHORE] disable mac test for lack of docker @samster25 (#1223)
- [CHORE] Begin integrating Rust Logical Plan with Dataframe API @xcharleslin (#1207)
- [CHORE] integration tests for nightly platform wheels @samster25 (#1219)
- [CHORE] Remove existing LogicalPlan from all execution concepts @xcharleslin (#1208)
- [CHORE] Add endpoints to simulate rate-limiting on AWS S3 buckets @jaychia (#1220)
- [CHORE] Add pytest marker for integration @jaychia (#1211)
- [CHORE] Add s3 fixtures for retrying logic @jaychia (#1206)
- [CHORE] Add developer flag to use Rust query planner @xcharleslin (#1205)
- [CHORE] Rust Logical plan skeleton @xcharleslin (#1192)
⬆️ Dependencies
7 changes
- Bump tempfile from 3.7.0 to 3.7.1 @dependabot (#1238)
- Bump ray[data,default] from 2.5.1 to 2.6.1 @dependabot (#1200)
- Bump numpy from 1.25.1 to 1.25.2 @dependabot (#1199)
- Bump tempfile from 3.6.0 to 3.7.0 @dependabot (#1198)
- Bump serde_json from 1.0.103 to 1.0.104 @dependabot (#1197)
- Bump num-traits from 0.2.15 to 0.2.16 @dependabot (#1196)
- Bump serde from 1.0.171 to 1.0.179 @dependabot (#1195)
v0.1.10
Changes
✨ New Features
- [FEAT] Enable feature-flagged native downloader in daft.read_parquet @jaychia (#1190)
- [FEAT] parquet reader refactor, add parquet_stats_reader and parquet_schema_reader (1/2) @samster25 (#1191)
🚀 Performance Improvements
- [PERF] native streaming parquet @samster25 (#1193)
🧰 Maintenance
⬆️ Dependencies
6 changes
- Bump isbang/compose-action from 1.4.1 to 1.5.0 @dependabot (#1178)
- Bump serde_json from 1.0.100 to 1.0.103 @dependabot (#1168)
- Bump pyo3-log from 0.8.2 to 0.8.3 @dependabot (#1167)
- Bump dyn-clone from 1.0.11 to 1.0.12 @dependabot (#1166)
- Bump numpy from 1.25.0 to 1.25.1 @dependabot (#1164)
- Bump lxml from 4.9.2 to 4.9.3 @dependabot (#1163)
v0.1.9
Changes
🏆 Highlights
- [FEAT] [Tensor] Add support for
Tensor
andFixedShapeTensor
types. @clarkzinzow (#1073)
✨ New Features
- [FEAT] Consolidate to list namespace @jaychia (#1180)
- [FEAT] Add .image.crop Expression @jaychia (#1175)
- [FEAT] [Tensor] Add support for
Tensor
andFixedShapeTensor
types. @clarkzinzow (#1073) - [FEAT] Basic support for Arrow 128-bit Decimal. @xcharleslin (#1129)
- [FEAT] Native Parquet Downloader @samster25 (#1107)
🚀 Performance Improvements
- [PERF] Simple Read Planner and RangeReader for Native Parquet Reader @samster25 (#1172)
👾 Bug Fixes
- [BUG] Fix ownership model of IOClient @samster25 (#1128)
- [BUG] Ownership of Runtime and Clients @samster25 (#1125)
📖 Documentation
- [DOCS] Fix broken link to Ray Datasets docs @jaychia (#1186)
- [FEAT] Consolidate to list namespace @jaychia (#1180)
- [DOCS] Add docs for tensor dtype @jaychia (#1170)
- [DOCS] Add Flyte example @jaychia (#1150)
- [CHORE] Update README.rst typo @jaychia (#1141)
🧰 Maintenance
- [CHORE] Bump cargo version to 0.1.9 @jaychia (#1187)
- [CHORE] Exclude JSON pre-commit fixer for ipynb files @jaychia (#1184)
- [CHORE] New daft-plan crate; trait TreeDisplay @xcharleslin (#1176)
- [CHORE] More Parquet benchmarking @jaychia (#1160)
- [CHORE] Enable Parquet Integration tests for decimal types @samster25 (#1161)
- [CHORE] cache all crates @samster25 (#1158)
- [CHORE] move parquet unit tests under io @samster25 (#1157)
- [CHORE] [CI] use smarter github rust cache action @samster25 (#1156)
- [CHORE] bump profiling timeout @samster25 (#1155)
- [CHORE] Native Parquet Integration Tests @samster25 (#1154)
- [CHORE] Remove use of
dirs_exist_ok
which was only added in Py3.8 @jaychia (#1153) - [CHORE] Add parquet benchmarking @jaychia (#1151)
- [CHORE] Cleans up IO integration test fixtures for re-use @jaychia (#1152)
- [CHORE] Update README.rst typo @jaychia (#1141)
- [CHORE] No-op test for various parquet files @jaychia (#1130)
- [CHORE] Tidy typing for remaining binary ops: logical, comp @xcharleslin (#1124)
- [CHORE] Use workspace for cargo check @samster25 (#1127)
⬆️ Dependencies
10 changes
- Bump orjson from 3.9.1 to 3.9.2 @dependabot (#1143)
- Bump pandas from 2.0.2 to 2.0.3 @dependabot (#1142)
- Bump snafu from 0.7.4 to 0.7.5 @dependabot (#1146)
- Bump serde_json from 1.0.99 to 1.0.100 @dependabot (#1147)
- Bump opencv-python from 4.7.0.72 to 4.8.0.74 @dependabot (#1117)
- Bump ray[data,default] from 2.4.0 to 2.5.1 @dependabot (#1074)
- Bump chrono-tz from 0.8.2 to 0.8.3 @dependabot (#1119)
- Bump pyo3 from 0.19.0 to 0.19.1 @dependabot (#1122)
- Bump async-trait from 0.1.68 to 0.1.71 @dependabot (#1126)
- Bump tokio from 1.28.2 to 1.29.1 @dependabot (#1120)
v0.1.8
Changes
✨ New Features
- [FEAT] Ranged Get Native Downloader @samster25 (#1113)
- [FEAT] Native S3 Downloader Anonymous Mode @samster25 (#1105)
- [FEAT] Enable reading a list of URLs in read_* APIs @jaychia (#1102)
- [FEAT] Arithmetic with timestamps and durations. @xcharleslin (#1103)
- [FEAT] Automatic Region Retrying for S3 Native Downloader @samster25 (#1098)
- [FEAT] Better styling of large dataframe cells in HTML @jaychia (#1097)
👾 Bug Fixes
- [BUG] S3 Downloader set default region when region not detected @samster25 (#1100)
📖 Documentation
- [CHORE] Update README.rst for image downloading @jaychia (#1109)
- [DOCS] Update image tutorials with
.image
namespaced expressions @jaychia (#1110)
🧰 Maintenance
- [CHORE] Tidy up typing of binary ops [1/2] @xcharleslin (#1114)
- [CHORE] Pin Pydantic to < 2 @jaychia (#1115)
- [CHORE] Remove rogue print statement @jaychia (#1112)
- [CHORE] Install wheel together with requirements in release build @jaychia (#1111)
- [CHORE] Update README.rst for image downloading @jaychia (#1109)
- [CHORE] Adding more test fixtures for different I/O sources @jaychia (#1083)
- [CHORE] Cache build artifacts in target folder @jaychia (#1104)
- [CHORE] Fix CI caching to cache integration test builds separately @jaychia (#1101)
- [CHORE] Use maturin directly instead of multiplatform build step @jaychia (#1099)
⬆️ Dependencies
- Bump serde_json from 1.0.97 to 1.0.99 @dependabot (#1095)
- Bump pytest from 7.3.2 to 7.4.0 @dependabot (#1089)
v0.1.7
Changes
🏆 Highlights
- [FEAT] Add
DataFrame.to_torch_map_dataset
and.to_torch_iter_dataset
. @xcharleslin (#1086) - [PERF] Rust based url downloading with error handling @samster25 (#1061)
✨ New Features
- [FEAT] Enable Native Downloader IO Config @samster25 (#1090)
- [FEAT] Add
DataFrame.to_torch_map_dataset
and.to_torch_iter_dataset
. @xcharleslin (#1086) - [FEAT] DataFrame.__iter__() and .iter_partitions() @xcharleslin (#1062)
- [FEAT] New DataType: Duration (without arithmetic) @xcharleslin (#1051)
- [FEAT] [Images] [9/N] Infer
Image
type for PIL images on ingress. @clarkzinzow (#1067) - [FEAT] Automatically cast logical types to Python objects on
Series.to_pylist()
. @clarkzinzow (#1063) - [FEAT] [Images] [8/N] Add encoding and resizing support for fixed-shape images. @clarkzinzow (#1052)
- Dataframe Iter 1/n: Physical plan streams results into Runner. @xcharleslin (#1060)
🚀 Performance Improvements
- [PERF] Rust based url downloading with error handling @samster25 (#1061)
👾 Bug Fixes
- [BUG] Fix remote mode typo @xcharleslin (#1092)
- [BUG] Reenable HTML viz hooks for np.ndarray and PIL Images @jaychia (#1078)
- [BUG] Fix string index bug in table repr @xcharleslin (#1079)
- [BUG] pin the version of python used in publishing @samster25 (#1068)
- [BUG] [CI] Fix merge conflict due to out-of-date base. @clarkzinzow (#1066)
📖 Documentation
- [FEAT] Add
DataFrame.to_torch_map_dataset
and.to_torch_iter_dataset
. @xcharleslin (#1086) - [CHORE] Fix filepath for autogeneration of .list.join docs @jaychia (#1084)
- In CI, limit tutorial to 500 rows @xcharleslin (#1076)
- [DOCS] Embeddings tutorial: Temporarily remove full dataset @xcharleslin (#1039)
- [DOCS] Remove release notes from documentation, link to Github instead @jaychia (#1049)
🧰 Maintenance
- [CHORE] set dependabot schedule to weekly @samster25 (#1085)
- [CHORE] Refactor integration test to use wheel built for release @jaychia (#1087)
- [CHORE] unpin numpy version for py<3.8 @jaychia (#1088)
- [CHORE] Fix filepath for autogeneration of .list.join docs @jaychia (#1084)
- [CHORE] Crate Smash v1 @samster25 (#1080)
- [CHORE] Scheduler cleanup: merge logical_op_runners.py into execution_step @xcharleslin (#1020)
- [CHORE] Inline the label enforcer into the release drafter wf @jaychia (#1057)
- [CHORE] Fix naming of "Release Drafter" workflow in trigger @jaychia (#1055)
- [CHORE] Add new trigger to run PR label enforcement after Release Drafter @jaychia (#1054)
- [CHORE][CI] Use pyarrow Table sort API that's compatible with older pyarrow versions @clarkzinzow (#1053)
⬆️ Dependencies
4 changes
- Bump hypothesis from 6.79.1 to 6.79.2 @dependabot (#1082)
- [CHORE] Crate Smash v1 @samster25 (#1080)
- Bump hypothesis from 6.78.2 to 6.79.1 @dependabot (#1065)
- Bump numpy from 1.24.3 to 1.25.0 @dependabot (#1064)
v0.1.6
Changes
🏆 Highlights
- [FEAT] Support for Timestamp datatype. @xcharleslin (#1032)
✨ New Features
- [FEAT] Support for Timestamp datatype. @xcharleslin (#1032)
- [FEAT] Thread user-provided schema through to DataFrame reads @jaychia (#1024)
- [FEAT] Daft Image viz support. Remove Tabulate dependency. @xcharleslin (#1027)
- [FEAT] Dataframe Concats @jaychia (#1023)
- [FEAT] Add kernels for .list.join on a list[utf8] column @jaychia (#989)
- [FEAT][Table-Read-Schema 2/3] Add table casting logic @jaychia (#1012)
- [FEAT][Table-Read-Schema 1/3] Split reading tabular file formats into 2 method calls @jaychia (#1010)
- [FEAT][Images] [7/N] Add image encoding support. @clarkzinzow (#1013)
- [FEAT] Visualization cleanup 2/n: Add repr_html to Series, Table, and PyO3 @xcharleslin (#1018)
- [FEAT] Visualization cleanup (1/n): Use Table for repr @xcharleslin (#1011)
📖 Documentation
- [DOCS] Fix links to ray.io latest docs @jaychia (#1038)
- [DOCS] Add initial docs pass, adding lots of cross-reference links. @clarkzinzow (#1009)
- [DOCS][Images] [6/N] Fix image dtype docs. @clarkzinzow (#1008)
- 0.1.5 release notes @samster25 (#1007)
🧰 Maintenance
- [CHORE] Update cargo version to v0.1.6 @jaychia (#1047)
- [CHORE] Add a GitHub action to enforce labels are added to the PR before merging @jaychia (#1045)
- [CHORE] Fix CI TPCH data generation for old deprecated kwarg @jaychia (#1044)
- [CHORE] Fix footer of release-drafter @jaychia (#1043)
- [CHORE] Add release-drafter files @jaychia (#1042)
- [CHORE][CI] Fix flakiness in Datasets integration tests. @clarkzinzow (#1017)
⬆️ Dependencies
7 changes
- Bump pytest from 7.3.1 to 7.3.2 @dependabot (#1034)
- Bump log from 0.4.18 to 0.4.19 @dependabot (#1036)
- Bump hypothesis from 6.76.0 to 6.78.2 @dependabot (#1040)
- Bump s3fs from 2023.5.0 to 2023.6.0 @dependabot (#1029)
- Bump orjson from 3.9.0 to 3.9.1 @dependabot (#1031)
- Bump dask from 2023.5.0 to 2023.6.0 @dependabot (#1030)
- Bump serde from 1.0.163 to 1.0.164 @dependabot (#1025)
v0.1.5
The Daft 0.1.5 release features better series exporting, bugfixes and improved documentation.
Enhancements
- Enable Cast from Image to Python via Numpy #990
Bug Fixes
- Fix Image Resize/Decode Expressions #1001
Build Changes
- Python script for subprefixing s3 tpch files #997
- Update pyo3-log from 0.8.1 to 0.8.2 #996
- Update hypothesis from 6.75.9 to 6.76.0 #995