Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tracking] Kernelize! #3298

Open
roeap opened this issue Mar 4, 2025 · 2 comments
Open

[tracking] Kernelize! #3298

roeap opened this issue Mar 4, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@roeap
Copy link
Collaborator

roeap commented Mar 4, 2025

Important

This is a living issue that we'll update with new issues / comments as we get more clarity on the concrete implementations.

Description

This is a tracking issue to align on and coordinate the integration of delta-kernel-rs into delta-rs.

Motivation

While tremendous strides have been made by the community to support more and more delta features in delta-rs, we are still lagging behind with more features on the way that user will want to leverage. This is exactly the use case the kernel libraries aim to address - a correct and complete implementation of the Delta protocol.

Kernel explicitly does not take an opinion on all io / execution related aspects that are needed to actually consume / work with delta tables. This is what delta-rs provides, leaving the current (high level) user facing APIs conceptually as is.

Execution

In simplified terms adopting kernel mean carving out the functionality that currently resides in

  • core/src/kernel (named so in preparation for being replaced by kernel)
  • core/src/protocol (mainly our snapshot code, that I wanted to update for quite a while now)
  • core/src/schema (only partition pruning remained in this module after previous updates)

At the heart of the migration is creating a new snapshot implementation (RFC in #3137) which provides all required machinery (the engine) to kernel and exposes methods tailored to the needs of delta-rs.

One potential avenue forward is to get the RFC merge-ready and merge it without being "hooked-up" to the rest of the crate. This PR also exposes a Snapshot trait (we already have something similar, but not quite fitting - I think) the we can hopefully leverage to refactor all the operations that require access to the snapshot - i.e. implement that trait for current snapshots ... This should hopefully surface any missing APIs in kernel that we may yet require for full adoption.

Challenges

  • kernel currently only has very limited write paths support, so we'll have to keep maintaining that for now. However we can motivate the API designs based on our needs.
  • In terms of feature support there is no full overlap as of now. e.g. kernel supports deletion vectors, which delta-rs does not, but delta-rs supports generated columns, which are not yet part of kernel and still require some designs (i.e. how to handle arbitrary SQL that an engine will need to parse).
  • While kernel offers great opportunities for performance enhancements, there are several areas that might take an initial hit until we can implement performance optimizations that work well with kernel. These mainly relate to less frequently requests actions such as Txn, CommtInfo ...

Any feedback / concers around proceeding with this is highly appreciated.

Related Work

PRs cannot be tracked as sub-issues

@roeap roeap added the enhancement New feature or request label Mar 4, 2025
@roeap roeap pinned this issue Mar 4, 2025
@roeap roeap self-assigned this Mar 4, 2025
@ion-elgreco
Copy link
Collaborator

@roeap Based on the challenges you wrote, I think we should actually move forward with the current API as 1.0 for us.

Because one thing I would not like to see is we hit many regressions after switching to kernel, and then this would set us back many months if not longer to get to a 1.0 release. I would rather see us doing this in 1.X work and then we can release 2.0 after that is stabilized

@roeap
Copy link
Collaborator Author

roeap commented Mar 4, 2025

@ion-elgreco - i do see the point, but are we under some time pressure? I think we still have a few correctness problems anyways that I personally would expect to handled in a 1.0.

But yes, this is certainly not going to be done in a week or so.

Personally I do also still struggle a bit with how wide our APIs on the delta table are. i.e. tailored to both log inspection and table scans. The SQL APIs are also still considered experimental IIRC, do we have a design for that?

Finally getting to 1.0 is something I am tremendously looking forward to, but is there any motivation to rush it now that we are nearing the end? Knowing we might break things?

W.r.t. to the challenges, I think we can cover most of this via a hybrid state where we leverage what kernel can do and layer our stuff on top of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants