Skip to content

Commit

Permalink
Merge branch 'main' into table-feature-enum
Browse files Browse the repository at this point in the history
  • Loading branch information
hntd187 authored Feb 27, 2025
2 parents 5e5fd69 + 8b5e06d commit 8188cc4
Show file tree
Hide file tree
Showing 60 changed files with 1,448 additions and 826 deletions.
90 changes: 90 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,95 @@
# Changelog

## [v0.7.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.7.0/) (2025-02-24)

[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.6.1...v0.7.0)

### 🏗️ Breaking changes
1. Read transforms are now communicated via expressions ([#607], [#612], [#613], [#614]) This includes:
- `ScanData` now includes a third tuple field: a row-indexed vector of transforms to apply to the `EngineData`.
- Adds a new `scan::state::transform_to_logical` function that encapsulates the boilerplate of applying the transform expression
- Removes `scan_action_iter` API and `logical_to_physical` API
- Removes `column_mapping_mode` from `GlobalScanState`
- ffi: exposes methods to get an expression evaluator and evaluate an expression from c
- read-table example: Removes `add_partition_columns` in arrow.c
- read-table example: adds an `apply_transform` function in arrow.c
2. ffi: support field nullability in schema visitor ([#656])
3. ffi: expose metadata in SchemaEngineVisitor ffi api ([#659])
4. ffi: new `visit_schema` FFI now operates on a `Schema` instead of a `Snapshot` ([#683], [#709])
5. Introduced feature flags (`arrow_54` and `arrow_53`) to select major arrow versions ([#654], [#708], [#717])

### 🚀 Features / new APIs

1. Read `partition_values` in `RemoveVisitor` and remove `break` in `RowVisitor` for `RemoveVisitor` ([#633])
2. Add the in-commit timestamp field to CommitInfo ([#581])
3. Support NOT and column expressions in eval_sql_where ([#653])
4. Add check for schema read compatibility ([#554])
5. Introduce `TableConfiguration` to jointly manage metadata, protocol, and table properties ([#644])
6. Add visitor `SidecarVisitor` and `Sidecar` action struct ([#673])
7. Add in-commit timestamps table properties ([#558])
8. Support writing to writer version 1 ([#693])
9. ffi: new `logical_schema` FFI to get the logical schema of a snapshot ([#709])

### 🐛 Bug Fixes

1. Incomplete multi-part checkpoint handling when no hint is provided ([#641])
2. Consistent PartialEq for Scalar ([#677])
3. Cargo fmt does not handle mods defined in macros ([#676])
4. Ensure properly nested null masks for parquet reads ([#692])
5. Handle predicates on non-nullable columns without stats ([#700])

### 📚 Documentation

1. Update readme to reflect tracing feature is needed for read-table ([#619])
2. Clarify `JsonHandler` semantics on EngineData ordering ([#635])

### 🚜 Refactor

1. Make [non] nullable struct fields easier to create ([#646])
2. Make eval_sql_where available to DefaultPredicateEvaluator ([#627])

### 🧪 Testing

1. Port cdf tests from delta-spark to kernel ([#611])

### ⚙️ Chores/CI

1. Fix some typos ([#643])
2. Release script publishing fixes ([#638])

[#638]: https://github.com/delta-io/delta-kernel-rs/pull/638
[#643]: https://github.com/delta-io/delta-kernel-rs/pull/643
[#619]: https://github.com/delta-io/delta-kernel-rs/pull/619
[#635]: https://github.com/delta-io/delta-kernel-rs/pull/635
[#633]: https://github.com/delta-io/delta-kernel-rs/pull/633
[#611]: https://github.com/delta-io/delta-kernel-rs/pull/611
[#581]: https://github.com/delta-io/delta-kernel-rs/pull/581
[#646]: https://github.com/delta-io/delta-kernel-rs/pull/646
[#627]: https://github.com/delta-io/delta-kernel-rs/pull/627
[#641]: https://github.com/delta-io/delta-kernel-rs/pull/641
[#653]: https://github.com/delta-io/delta-kernel-rs/pull/653
[#607]: https://github.com/delta-io/delta-kernel-rs/pull/607
[#656]: https://github.com/delta-io/delta-kernel-rs/pull/656
[#554]: https://github.com/delta-io/delta-kernel-rs/pull/554
[#644]: https://github.com/delta-io/delta-kernel-rs/pull/644
[#659]: https://github.com/delta-io/delta-kernel-rs/pull/659
[#612]: https://github.com/delta-io/delta-kernel-rs/pull/612
[#677]: https://github.com/delta-io/delta-kernel-rs/pull/677
[#676]: https://github.com/delta-io/delta-kernel-rs/pull/676
[#673]: https://github.com/delta-io/delta-kernel-rs/pull/673
[#613]: https://github.com/delta-io/delta-kernel-rs/pull/613
[#558]: https://github.com/delta-io/delta-kernel-rs/pull/558
[#692]: https://github.com/delta-io/delta-kernel-rs/pull/692
[#700]: https://github.com/delta-io/delta-kernel-rs/pull/700
[#683]: https://github.com/delta-io/delta-kernel-rs/pull/683
[#654]: https://github.com/delta-io/delta-kernel-rs/pull/654
[#693]: https://github.com/delta-io/delta-kernel-rs/pull/693
[#614]: https://github.com/delta-io/delta-kernel-rs/pull/614
[#709]: https://github.com/delta-io/delta-kernel-rs/pull/709
[#708]: https://github.com/delta-io/delta-kernel-rs/pull/708
[#717]: https://github.com/delta-io/delta-kernel-rs/pull/717


## [v0.6.1](https://github.com/delta-io/delta-kernel-rs/tree/v0.6.1/) (2025-01-10)

[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.6.0...v0.6.1)
Expand Down
17 changes: 1 addition & 16 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,9 @@ license = "Apache-2.0"
repository = "https://github.com/delta-io/delta-kernel-rs"
readme = "README.md"
rust-version = "1.80"
version = "0.6.1"
version = "0.7.0"

[workspace.dependencies]
# When changing the arrow version range, also modify ffi/Cargo.toml which has
# its own arrow version ranges witeh modified features. Failure to do so will
# result in compilation errors as two different sets of arrow dependencies may
# be sourced
arrow = { version = ">=53, <55" }
arrow-arith = { version = ">=53, <55" }
arrow-array = { version = ">=53, <55" }
arrow-buffer = { version = ">=53, <55" }
arrow-cast = { version = ">=53, <55" }
arrow-data = { version = ">=53, <55" }
arrow-ord = { version = ">=53, <55" }
arrow-json = { version = ">=53, <55" }
arrow-select = { version = ">=53, <55" }
arrow-schema = { version = ">=53, <55" }
parquet = { version = ">=53, <55", features = ["object_store"] }
object_store = { version = ">=0.11, <0.12" }
hdfs-native-object-store = "0.12.0"
hdfs-native = "0.10.0"
Expand Down
39 changes: 13 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@ consumer's own `Engine` trait, the kernel has a feature flag to enable a default
```toml
# fewer dependencies, requires consumer to implement Engine trait.
# allows consumers to implement their own in-memory format
delta_kernel = "0.6.1"
delta_kernel = "0.7.0"

# or turn on the default engine, based on arrow
delta_kernel = { version = "0.6.1", features = ["default-engine"] }
delta_kernel = { version = "0.7.0", features = ["default-engine"] }
```

### Feature flags
Expand Down Expand Up @@ -74,32 +74,19 @@ quickly. To enable engines that already integrate arrow to also integrate kernel
to track a specific version of arrow that kernel depends on, we take as broad dependency on arrow
versions as we can.

This means you can force kernel to rely on the specific arrow version that your engine already uses,
as long as it falls in that range. You can see the range in the `Cargo.toml` in the same folder as
this `README.md`.
We allow selecting the version of arrow to use via feature flags. Currently we support the following
flags:

For example, although arrow 53.1.0 has been released, you can force kernel to compile on 53.0 by
putting the following in your project's `Cargo.toml`:
- `arrow_53`: Use arrow version 53
- `arrow_54`: Use arrow version 54

```toml
[patch.crates-io]
arrow = "53.0"
arrow-arith = "53.0"
arrow-array = "53.0"
arrow-buffer = "53.0"
arrow-cast = "53.0"
arrow-data = "53.0"
arrow-ord = "53.0"
arrow-json = "53.0"
arrow-select = "53.0"
arrow-schema = "53.0"
parquet = "53.0"
```
Note that if more than one `arrow_x` feature is enabled, kernel will default to the _lowest_
specified flag. This also means that if you use `--all-features` you will get the lowest version of
arrow that kernel supports.

Note that unfortunately patching in `cargo` requires that _exactly one_ version matches your
specification. If only arrow "53.0.0" had been released the above will work, but if "53.0.1" where
to be released, the specification will break and you will need to provide a more restrictive
specification like `"=53.0.0"`.
If no arrow feature is enabled, but are least one of `default-engine`, `sync-engine`,
`arrow-conversion` or, `arrow-expression` is enabled, the lowest supported arrow version will be
enabled.

### Object Store
You may also need to patch the `object_store` version used if the version of `parquet` you depend on
Expand Down Expand Up @@ -186,4 +173,4 @@ Some design principles which should be considered:
[cargo-llvm-cov]: https://github.com/taiki-e/cargo-llvm-cov
[FFI]: ffi/
[Arrow]: https://arrow.apache.org/rust/arrow/index.html
[Tokio]: https://tokio.rs/
[Tokio]: https://tokio.rs/
7 changes: 1 addition & 6 deletions acceptance/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,14 @@ rust-version.workspace = true
release = false

[dependencies]
arrow-array = { workspace = true }
arrow-cast = { workspace = true }
arrow-ord = { workspace = true }
arrow-select = { workspace = true }
arrow-schema = { workspace = true }
delta_kernel = { path = "../kernel", features = [
"default-engine",
"arrow_53",
"developer-visibility",
] }
futures = "0.3"
itertools = "0.13"
object_store = { workspace = true }
parquet = { workspace = true }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
thiserror = "1"
Expand Down
17 changes: 10 additions & 7 deletions acceptance/src/data.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
use std::{path::Path, sync::Arc};

use arrow_array::{Array, RecordBatch};
use arrow_ord::sort::{lexsort_to_indices, SortColumn};
use arrow_schema::{DataType, Schema};
use arrow_select::{concat::concat_batches, filter::filter_record_batch, take::take};
use delta_kernel::arrow::array::{Array, RecordBatch};
use delta_kernel::arrow::compute::{
concat_batches, filter_record_batch, lexsort_to_indices, take, SortColumn,
};
use delta_kernel::arrow::datatypes::{DataType, Schema};

use delta_kernel::parquet::arrow::async_reader::{
ParquetObjectReader, ParquetRecordBatchStreamBuilder,
};
use delta_kernel::{engine::arrow_data::ArrowEngineData, DeltaResult, Engine, Error, Table};
use futures::{stream::TryStreamExt, StreamExt};
use itertools::Itertools;
use object_store::{local::LocalFileSystem, ObjectStore};
use parquet::arrow::async_reader::{ParquetObjectReader, ParquetRecordBatchStreamBuilder};

use crate::{TestCaseInfo, TestResult};

Expand Down Expand Up @@ -83,8 +86,8 @@ fn assert_schema_fields_match(schema: &Schema, golden: &Schema) {
fn normalize_col(col: Arc<dyn Array>) -> Arc<dyn Array> {
if let DataType::Timestamp(unit, Some(zone)) = col.data_type() {
if **zone == *"+00:00" {
arrow_cast::cast::cast(&col, &DataType::Timestamp(*unit, Some("UTC".into())))
.expect("Could not cast to UTC")
let data_type = DataType::Timestamp(*unit, Some("UTC".into()));
delta_kernel::arrow::compute::cast(&col, &data_type).expect("Could not cast to UTC")
} else {
col
}
Expand Down
2 changes: 1 addition & 1 deletion feature-tests/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ version.workspace = true
release = false

[dependencies]
delta_kernel = { path = "../kernel" }
delta_kernel = { path = "../kernel", features = ["arrow_53"] }

[features]
default-engine = [ "delta_kernel/default-engine" ]
Expand Down
17 changes: 3 additions & 14 deletions ffi/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,13 @@ tracing-core = { version = "0.1", optional = true }
tracing-subscriber = { version = "0.3", optional = true, features = [ "json" ] }
url = "2"
delta_kernel = { path = "../kernel", default-features = false, features = [
"arrow",
"developer-visibility",
] }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.6.1" }

# used if we use the default engine to be able to move arrow data into the c-ffi format
arrow-schema = { version = ">=53, <55", default-features = false, features = [
"ffi",
], optional = true }
arrow-data = { version = ">=53, <55", default-features = false, features = [
"ffi",
], optional = true }
arrow-array = { version = ">=53, <55", default-features = false, optional = true }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.7.0" }

[build-dependencies]
cbindgen = "0.27.0"
cbindgen = "0.28"
libc = "0.2.158"

[dev-dependencies]
Expand All @@ -52,9 +44,6 @@ default = ["default-engine"]
cloud = ["delta_kernel/cloud"]
default-engine = [
"delta_kernel/default-engine",
"arrow-array",
"arrow-data",
"arrow-schema",
]
tracing = [ "tracing-core", "tracing-subscriber" ]
sync-engine = ["delta_kernel/sync-engine"]
Expand Down
2 changes: 1 addition & 1 deletion ffi/cbindgen.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ parse_deps = true
# only crates found in this list will ever be parsed.
#
# default: there is no allow-list (NOTE: this is the opposite of [])
include = ["delta_kernel", "arrow-data", "arrow-schema"]
include = ["arrow", "arrow-data", "arrow-schema", "delta_kernel"]
Loading

0 comments on commit 8188cc4

Please sign in to comment.