Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK][PROTOCOL RFC] Checkpoint Protection Up To Version #4153

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions protocol_rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,14 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024,

### Proposed RFCs

| Date proposed | RFC file | Github issue | RFC title |
|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------|:---------------------------------------|
| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening |
| 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits |
| 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking |
| 2023-04-24 | [variant-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | https://github.com/delta-io/delta/issues/2864 | Variant Data Type |
| 2024-04-30 | [collated-string-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/collated-string-type.md) | https://github.com/delta-io/delta/issues/2894 | Collated String Type |
| Date proposed | RFC file | Github issue | RFC title |
|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------|:------------------------------------|
| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening |
| 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits |
| 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking |
| 2023-04-24 | [variant-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | https://github.com/delta-io/delta/issues/2864 | Variant Data Type |
| 2024-04-30 | [collated-string-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/collated-string-type.md) | https://github.com/delta-io/delta/issues/2894 | Collated String Type |
| 2025-02-12 | [checkpoint-protection.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/checkpoint-protection.md) | https://github.com/delta-io/delta/issues/4152 | Checkpoint Protection Up To Version |

### Accepted RFCs

Expand Down
43 changes: 43 additions & 0 deletions protocol_rfcs/checkpoint-protection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Checkpoint Protection

This RFC introduces a new Writer feature named `checkpointProtection`. When the feature is present in the protocol, no checkpoint removal/creation before that version is allowed during metadata cleanup unless everything is cleaned up in one go.

The motivation is to improve the drop feature functionality. Today, dropping a feature requires the execution of the DROP FEATURE command twice with a 24 hour waiting time in between. In addition, it also results in the truncation of the history of the Delta table to the last 24 hours.

We can improve this process by introducing `checkpointProtection`, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it.

A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported commit
records behind them. With the `checkpointProtection`, we can guarantee these checkpoints will persist until history is truncated.

Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The `checkpointProtection` also protects against these cases by disallowing checkpoint creation before `requireCheckpointProtectionBeforeVersion`.

With these changes, we can drop table features in a single command without needing to truncate history. More importantly, they simplify the drop feature user journey by requiring a single execution of the DROP FEATURE command.

**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/4152**

--------

> ***Add a new section at the [Table Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#table-features) section***
# Checkpoint Protection

The `checkpointProtection` is a Writer feature that allows writers to clean up metadata if and only if metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go.

Enablement:
- The table must be at least on Writer Version 7 and Reader Version 1.
- The feature `checkpointProtection` must exist in the table `protocol`'s `writerFeatures`.

## Writer Requirements for Checkpoint Protection

For tables with `checkpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed if and only if metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. Furthermore, before removing checkpoints, all associated commits need to be removed first. This operation should have the same atomicity guarantees (if any) as with the regular metadata cleanup operation.

We can allow history truncation at an earlier commit, as long as checkpoints are removed together with the associated commits, and if any of the two following exceptions hold:

a) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version.

b) The writer verifies it supports all protocols in the closed range `[start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)]` (assuming a single checkpoint is created at `targetCleanupVersion`).

The `checkpointProtection` feature can only be removed if history is truncated up to at least the `requireCheckpointProtectionBeforeVersion`.

## Recommendations for Readers of Tables with Checkpoint Protection feature

For tables with `checkpointProtection` supported in the protocol, readers do not need to understand or change anything new; they just need to acknowledge the feature exists.
Loading