Skip to content

Releases: Eventual-Inc/Daft

v0.2.2

14 Nov 19:47
6abc006
Compare
Choose a tag to compare

Changes

  • [CHORE] Edit 'make-hooks' command to install pre-commit script @colin-ho (#1602)
  • [CHORE] Improve error messages when calling aggregation methods on dataframe without input columns @colin-ho (#1587)

✨ New Features

  • [FEAT] Add translation of IOConfig to PyArrow filesystem arguments @jaychia (#1592)
  • [FEAT] [Scan Operator] Refactor planning and execution code to use shared Pushdowns struct. @clarkzinzow (#1595)
  • [FEAT] [Scan Operator] Add ChunkSpec for specifying format-specific per-file row subset selection for ScanTasks. @clarkzinzow (#1590)
  • [FEAT] [Scan Operator] Integrate size_bytes with ScanOperators @clarkzinzow (#1586)
  • [FEAT] [Scan Operator] Add Python I/O support (+ JSON) to MicroPartition reads @clarkzinzow (#1578)
  • [FEAT][ScanOperator 1/3] Add MVP e2e ScanOperator integration. @clarkzinzow (#1559)

🚀 Performance Improvements

  • [PERF][REVERT] Reverts: use pyarrow table for pickling rather than ChunkedArray (#1488) @jaychia (#1605)
  • [PERF] Speed Up MicroPartition Ops when we know the result is empty @samster25 (#1604)

👾 Bug Fixes

  • [BUG] clean up ray scheduler threads after computing partial results @samster25 (#1597)
  • [BUG] Update requirements for typing_extensions @jaychia (#1593)
  • [BUG] Fix Deadlock with ScanOperators in to_physical_plan_scheduler and show iostats for glob and from_scan_task @samster25 (#1581)
  • [BUG] add allow threads for io pool operations @samster25 (#1580)

🧰 Maintenance

  • [CHORE] delete unused wheel tools @samster25 (#1603)
  • [CHORE] add IOStats to all micropartition ops @samster25 (#1584)
  • [CHORE] Use DAFT_MICROPARTITIONS as shared feature flag for data catalog support @jaychia (#1579)
  • [CHORE] Convert GlobScanOperator to perform streaming into result and take a list of glob paths @jaychia (#1577)

⬆️ Dependencies

v0.2.1

01 Nov 00:16
c8fe883
Compare
Choose a tag to compare

Changes

  • [FEAT] Support disabling using doubled quotes to escape in CSV @ravern (#1544)
  • [DOCS]: fix typo in doc @amir-f (#1534)

✨ New Features

  • [FEAT] GlobScanOperator @jaychia (#1550)
  • [FEAT] [New Query Planner] [2/N] Push partition spec into physical plan, remove Coalesce logical op. @clarkzinzow (#1540)

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

  • [CHORE] Fix bad merge conflict in GlobScanOperator wrt CSV schema inference @jaychia (#1556)
  • [CHORE] Revert "Bump pandas from 2.0.3 to 2.1.2" @jaychia (#1554)
  • [CHORE] [New Query Planner] [1/N] Remove Python query planner. @clarkzinzow (#1538)
  • [CHORE] changes to partition field and field creation @samster25 (#1537)
  • [CHORE] Move code from daft-csv to daft-decoding @jaychia (#1533)

⬆️ Dependencies

6 changes

v0.2.0

26 Oct 20:09
f49275b
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

  • [PERF] Add "eager mode" to limits and use in .show() @jaychia (#1498)
  • [PERF] Micropartition, lazy loading and Column Stats @samster25 (#1470)
  • [PERF] Use pyarrow table for pickling rather than ChunkedArray @samster25 (#1488)
  • [PERF] Use region from system and leverage cached credentials when making new clients @samster25 (#1490)
  • [PERF] Update default max_connections 64->8 because it is now per-io-thread @jaychia (#1485)
  • [PERF] Pass-through multithreaded_io flag in read_parquet @jaychia (#1484)

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.1.20

10 Oct 01:01
439f2bd
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

  • [PERF] Update number of cores on every iteration @jaychia (#1480)
  • [Hotfix] Change to streaming reader for CSV schema inference. @clarkzinzow (#1471)

👾 Bug Fixes

  • [BUG] Properly dispatch limited reads in new query planner @xcharleslin (#1476)
  • [BUG] Fixes globbing on windows by consolidating on posix-style paths @jaychia (#1472)

🧰 Maintenance

v0.1.19

06 Oct 22:42
bb74530
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

  • [BUG] fix circ import with pythonpath is set @samster25 (#1474)
  • [BUG] Don't remove all handles and Only use handler for files in src/ @samster25 (#1473)

🧰 Maintenance

v0.1.18

26 Sep 01:17
3403c0c
Compare
Choose a tag to compare

Changes

✨ New Features

👾 Bug Fixes

📖 Documentation

  • [BUG] [Docs] Allow source code discovery to fail silently for pyo3-defined classes when generating docs. @clarkzinzow (#1430)
  • [FEAT] Implement .dt.year/month/day for timestamp types @jaychia (#1385)

🧰 Maintenance

v0.1.17

12 Sep 06:39
601260b
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

  • [BUG] Respect multithreaded_io flag when reading parquet @samster25 (#1359)
  • [BUG] Schema Display should use dtype Display instead of Debug @jaychia (#1355)
  • [BUG] propagate parquet io error instead of panicking @samster25 (#1352)

🧰 Maintenance

  • [CHORE] [New Query Planner] Add simple df.explain() option; change to fixed-point policy for rule batch @clarkzinzow (#1354)
  • [CHORE] Add status code to IO integration tests @jaychia (#1356)
  • [CHORE] Fix List/FixedSizeList DataType to hold a dtype instead of Field @jaychia (#1351)
  • [CHORE] Add Series::full_null/empty/from_arrow to reduce code duplication @jaychia (#1331)
  • [CHORE] Add a Growable factory method @jaychia (#1330)
  • [CHORE] Add new ListArray @jaychia (#1329)

⬆️ Dependencies

5 changes

v0.1.16

06 Sep 02:07
bdc4ba4
Compare
Choose a tag to compare

Changes

✨ New Features

👾 Bug Fixes

  • [BUG] Fix Table.read_parquet behavior when it encounters arrow_schema @jaychia (#1336)
  • [BUG] [New Query Planner] Revert file info partition column names. @clarkzinzow (#1333)
  • [BUG] Fix fixed size list array FullNull implementation @jaychia (#1320)

🧰 Maintenance

  • [CHORE] install perl before maturin @samster25 (#1345)
  • [CHORE] Switch to openssl @samster25 (#1344)
  • [CHORE] [New Query Planner] pyo3-agnostic LogicalPlanBuilder, op constructor arg orderings @clarkzinzow (#1332)
  • [CHORE] factor io config into common code @samster25 (#1335)
  • [CHORE] [New Query Planner] Remove ExpressionsProjection from builder, move validation into Op::try_new() @clarkzinzow (#1327)
  • [CHORE] StructArray refactors @jaychia (#1326)
  • [CHORE] drop flag for non native compile for daft profiling @samster25 (#1323)
  • [CHORE] pin pyarrow to 12 for ray compat tests @samster25 (#1322)
  • [CHORE] Move FixedSizeListArray to array/fixed_size_list_array.rs @jaychia (#1319)
  • [CHORE] Add fix for list schema inference tests using PyArrow 13.0.0 @jaychia (#1318)
  • [CHORE] Implementations of FixedSizeListArray @jaychia (#1281)

⬆️ Dependencies

v0.1.15

28 Aug 06:43
59ed92a
Compare
Choose a tag to compare

Changes

✨ New Features

  • [FEAT] add row group support to daft parquet reader @samster25 (#1308)
  • [FEAT] [New Query Planner] Add logical plan hashing, rule batches, fixed-point policies, early optimizer termination, and optimization cycle detection. @clarkzinzow (#1292)

👾 Bug Fixes

🧰 Maintenance

  • [CHORE] Refactor Growable traits and downcast for lifetimes @jaychia (#1305)
  • [CHORE] Refactor broadcast to use growables @jaychia (#1304)
  • [CHORE] Code reduction in growable macros + logical if/else refactor @jaychia (#1301)
  • [CHORE] Refactor growables to return a Series instead of concrete arrays @jaychia (#1297)
  • [CHORE] Minor cleanup for logical_plan::Project @xcharleslin (#1299)

v0.1.14

24 Aug 23:35
7fa9e64
Compare
Choose a tag to compare

Changes

✨ New Features

  • [FEAT] add flag to use multithreaded io for parquet_read_table @samster25 (#1298)
  • [FEAT] Add Retry Mode, connection timeout, and read timeout to S3Config @samster25 (#1293)
  • [FEAT] [New Query Planner] Add optimization framework and PushDownFilter rule. @clarkzinzow (#1284)

👾 Bug Fixes

🧰 Maintenance