You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a living issue that we'll update with new issues / comments as we get more clarity on the concrete implementations.
Description
This is a tracking issue to align on and coordinate the integration of delta-kernel-rs into delta-rs.
Motivation
While tremendous strides have been made by the community to support more and more delta features in delta-rs, we are still lagging behind with more features on the way that user will want to leverage. This is exactly the use case the kernel libraries aim to address - a correct and complete implementation of the Delta protocol.
Kernel explicitly does not take an opinion on all io / execution related aspects that are needed to actually consume / work with delta tables. This is what delta-rs provides, leaving the current (high level) user facing APIs conceptually as is.
Execution
In simplified terms adopting kernel mean carving out the functionality that currently resides in
core/src/kernel (named so in preparation for being replaced by kernel)
core/src/protocol (mainly our snapshot code, that I wanted to update for quite a while now)
core/src/schema (only partition pruning remained in this module after previous updates)
At the heart of the migration is creating a new snapshot implementation (RFC in #3137) which provides all required machinery (the engine) to kernel and exposes methods tailored to the needs of delta-rs.
One potential avenue forward is to get the RFC merge-ready and merge it without being "hooked-up" to the rest of the crate. This PR also exposes a Snapshot trait (we already have something similar, but not quite fitting - I think) the we can hopefully leverage to refactor all the operations that require access to the snapshot - i.e. implement that trait for current snapshots ... This should hopefully surface any missing APIs in kernel that we may yet require for full adoption.
Challenges
kernel currently only has very limited write paths support, so we'll have to keep maintaining that for now. However we can motivate the API designs based on our needs.
In terms of feature support there is no full overlap as of now. e.g. kernel supports deletion vectors, which delta-rs does not, but delta-rs supports generated columns, which are not yet part of kernel and still require some designs (i.e. how to handle arbitrary SQL that an engine will need to parse).
While kernel offers great opportunities for performance enhancements, there are several areas that might take an initial hit until we can implement performance optimizations that work well with kernel. These mainly relate to less frequently requests actions such as Txn, CommtInfo ...
Any feedback / concers around proceeding with this is highly appreciated.
@roeap Based on the challenges you wrote, I think we should actually move forward with the current API as 1.0 for us.
Because one thing I would not like to see is we hit many regressions after switching to kernel, and then this would set us back many months if not longer to get to a 1.0 release. I would rather see us doing this in 1.X work and then we can release 2.0 after that is stabilized
@ion-elgreco - i do see the point, but are we under some time pressure? I think we still have a few correctness problems anyways that I personally would expect to handled in a 1.0.
But yes, this is certainly not going to be done in a week or so.
Personally I do also still struggle a bit with how wide our APIs on the delta table are. i.e. tailored to both log inspection and table scans. The SQL APIs are also still considered experimental IIRC, do we have a design for that?
Finally getting to 1.0 is something I am tremendously looking forward to, but is there any motivation to rush it now that we are nearing the end? Knowing we might break things?
W.r.t. to the challenges, I think we can cover most of this via a hybrid state where we leverage what kernel can do and layer our stuff on top of that.
Important
This is a living issue that we'll update with new issues / comments as we get more clarity on the concrete implementations.
Description
This is a tracking issue to align on and coordinate the integration of
delta-kernel-rs
intodelta-rs
.Motivation
While tremendous strides have been made by the community to support more and more delta features in
delta-rs
, we are still lagging behind with more features on the way that user will want to leverage. This is exactly the use case the kernel libraries aim to address - a correct and complete implementation of the Delta protocol.Kernel explicitly does not take an opinion on all io / execution related aspects that are needed to actually consume / work with delta tables. This is what
delta-rs
provides, leaving the current (high level) user facing APIs conceptually as is.Execution
In simplified terms adopting kernel mean carving out the functionality that currently resides in
core/src/kernel
(named so in preparation for being replaced by kernel)core/src/protocol
(mainly our snapshot code, that I wanted to update for quite a while now)core/src/schema
(only partition pruning remained in this module after previous updates)At the heart of the migration is creating a new snapshot implementation (RFC in #3137) which provides all required machinery (the engine) to kernel and exposes methods tailored to the needs of
delta-rs
.One potential avenue forward is to get the RFC merge-ready and merge it without being "hooked-up" to the rest of the crate. This PR also exposes a
Snapshot
trait (we already have something similar, but not quite fitting - I think) the we can hopefully leverage to refactor all the operations that require access to the snapshot - i.e. implement that trait for current snapshots ... This should hopefully surface any missing APIs in kernel that we may yet require for full adoption.Challenges
Txn
,CommtInfo
...Any feedback / concers around proceeding with this is highly appreciated.
Related Work
v2Checkpoint
reader/writer feature delta-kernel-rs#685Snapshot::new_from()
API delta-kernel-rs#549The text was updated successfully, but these errors were encountered: