From 65a1b3ff8580087be3fd2ae0e2a0d423c10d1149 Mon Sep 17 00:00:00 2001 From: andreaschat-db Date: Thu, 13 Feb 2025 13:05:57 +0100 Subject: [PATCH 1/6] v1 --- protocol_rfcs/README.md | 15 ++++----- protocol_rfcs/checkpoint-protection.md | 43 ++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 7 deletions(-) create mode 100644 protocol_rfcs/checkpoint-protection.md diff --git a/protocol_rfcs/README.md b/protocol_rfcs/README.md index d5f1fadc2d0..71118bdd99a 100644 --- a/protocol_rfcs/README.md +++ b/protocol_rfcs/README.md @@ -16,13 +16,14 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024, ### Proposed RFCs -| Date proposed | RFC file | Github issue | RFC title | -|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------|:---------------------------------------| -| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening | -| 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits | -| 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking | -| 2023-04-24 | [variant-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | https://github.com/delta-io/delta/issues/2864 | Variant Data Type | -| 2024-04-30 | [collated-string-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/collated-string-type.md) | https://github.com/delta-io/delta/issues/2894 | Collated String Type | +| Date proposed | RFC file | Github issue | RFC title | +|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------|:------------------------------------| +| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening | +| 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits | +| 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking | +| 2023-04-24 | [variant-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | https://github.com/delta-io/delta/issues/2864 | Variant Data Type | +| 2024-04-30 | [collated-string-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/collated-string-type.md) | https://github.com/delta-io/delta/issues/2894 | Collated String Type | +| 2025-02-12 | [checkpoint-protection.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/checkpoint-protection.md) | https://github.com/delta-io/delta/issues/4152 | Checkpoint Protection Up To Version | ### Accepted RFCs diff --git a/protocol_rfcs/checkpoint-protection.md b/protocol_rfcs/checkpoint-protection.md new file mode 100644 index 00000000000..03ab522b8c0 --- /dev/null +++ b/protocol_rfcs/checkpoint-protection.md @@ -0,0 +1,43 @@ +# Checkpoint Protection + +This RFC introduces a new Writer feature named `checkpointProtection`. When the feature is present in the protocol, no checkpoint removal/creation before that version is allowed during metadata cleanup unless everything is cleaned up in one go. + +The motivation is to improve the drop feature functionality. Today, dropping a feature requires the execution of the DROP FEATURE command twice with a 24 hour waiting time in between. In addition, it also results in the truncation of the history of the Delta table to the last 24 hours. + +We can improve this process by introducing `CheckpointProtection`, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it. + +A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported log records behind them. With the `CheckpointProtection`, we can guarantee these checkpoints will persist until history is truncated. + +Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The `CheckpointProtection` also protects against these cases by disallowing checkpoint creation before `requireCheckpointProtectionBeforeVersion`. + +With these changes, we can drop table features in a single command without needing to truncate history. More importantly, they simplify the drop feature user journey by requiring a single execution of the DROP FEATURE command. + +**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/4152** + +-------- + + +> ***New Section*** +# Checkpoint Protection + +The `CheckpointProtection` is a Writer feature that allows writers to clean up metadata iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. + +Enablement: +- The table must be on Writer Version 7 and Reader Version 1. +- The feature `checkpointProtection` must exist in the table `protocol`'s `writerFeatures`. + +## Writer Requirements for Checkpoint Protection + +For tables with `CheckpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. + +There are two exceptions to this rule. If any of the two holds, the rule above can be ignored: + +a) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version. + +b) The writer verifies it supports all protocols between `[cleanup start version, min(checkpoint creation version, requireCheckpointProtectionBeforeVersion)]`. + +The `CheckpointProtection` feature can only be removed if history is truncated up to at least the `requireCheckpointProtectionBeforeVersion`. + +## Recommendations for Readers of Tables with Checkpoint Protection feature + +For tables with `CheckpointProtection` supported in the protocol, readers do not need to understand or change anything new; they just need to acknowledge the feature exists. \ No newline at end of file From 21b37977f466bf57f04c848ff6295c6fbed834c9 Mon Sep 17 00:00:00 2001 From: andreaschat-db Date: Thu, 13 Feb 2025 13:14:40 +0100 Subject: [PATCH 2/6] v1 --- protocol_rfcs/checkpoint-protection.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/protocol_rfcs/checkpoint-protection.md b/protocol_rfcs/checkpoint-protection.md index 03ab522b8c0..c66aeb33267 100644 --- a/protocol_rfcs/checkpoint-protection.md +++ b/protocol_rfcs/checkpoint-protection.md @@ -4,11 +4,11 @@ This RFC introduces a new Writer feature named `checkpointProtection`. When the The motivation is to improve the drop feature functionality. Today, dropping a feature requires the execution of the DROP FEATURE command twice with a 24 hour waiting time in between. In addition, it also results in the truncation of the history of the Delta table to the last 24 hours. -We can improve this process by introducing `CheckpointProtection`, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it. +We can improve this process by introducing `checkpointProtection`, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it. -A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported log records behind them. With the `CheckpointProtection`, we can guarantee these checkpoints will persist until history is truncated. +A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported log records behind them. With the `checkpointProtection`, we can guarantee these checkpoints will persist until history is truncated. -Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The `CheckpointProtection` also protects against these cases by disallowing checkpoint creation before `requireCheckpointProtectionBeforeVersion`. +Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The `checkpointProtection` also protects against these cases by disallowing checkpoint creation before `requireCheckpointProtectionBeforeVersion`. With these changes, we can drop table features in a single command without needing to truncate history. More importantly, they simplify the drop feature user journey by requiring a single execution of the DROP FEATURE command. @@ -20,7 +20,7 @@ With these changes, we can drop table features in a single command without needi > ***New Section*** # Checkpoint Protection -The `CheckpointProtection` is a Writer feature that allows writers to clean up metadata iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. +The `checkpointProtection` is a Writer feature that allows writers to clean up metadata iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. Enablement: - The table must be on Writer Version 7 and Reader Version 1. @@ -28,7 +28,7 @@ Enablement: ## Writer Requirements for Checkpoint Protection -For tables with `CheckpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. +For tables with `checkpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. There are two exceptions to this rule. If any of the two holds, the rule above can be ignored: @@ -36,8 +36,8 @@ a) The writer does not create any checkpoints during history cleanup and does no b) The writer verifies it supports all protocols between `[cleanup start version, min(checkpoint creation version, requireCheckpointProtectionBeforeVersion)]`. -The `CheckpointProtection` feature can only be removed if history is truncated up to at least the `requireCheckpointProtectionBeforeVersion`. +The `checkpointProtection` feature can only be removed if history is truncated up to at least the `requireCheckpointProtectionBeforeVersion`. ## Recommendations for Readers of Tables with Checkpoint Protection feature -For tables with `CheckpointProtection` supported in the protocol, readers do not need to understand or change anything new; they just need to acknowledge the feature exists. \ No newline at end of file +For tables with `checkpointProtection` supported in the protocol, readers do not need to understand or change anything new; they just need to acknowledge the feature exists. \ No newline at end of file From 48a62b551685838fe113ad363e69db83b4f708ad Mon Sep 17 00:00:00 2001 From: andreaschat-db Date: Thu, 13 Feb 2025 16:34:25 +0100 Subject: [PATCH 3/6] fix --- protocol_rfcs/checkpoint-protection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/protocol_rfcs/checkpoint-protection.md b/protocol_rfcs/checkpoint-protection.md index c66aeb33267..228ddf90661 100644 --- a/protocol_rfcs/checkpoint-protection.md +++ b/protocol_rfcs/checkpoint-protection.md @@ -34,7 +34,7 @@ There are two exceptions to this rule. If any of the two holds, the rule above c a) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version. -b) The writer verifies it supports all protocols between `[cleanup start version, min(checkpoint creation version, requireCheckpointProtectionBeforeVersion)]`. +b) The writer verifies it supports all protocols between `[start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)]`. The `checkpointProtection` feature can only be removed if history is truncated up to at least the `requireCheckpointProtectionBeforeVersion`. From cea5110d71222d994bdd3df6d51fc74f173ea6fe Mon Sep 17 00:00:00 2001 From: andreaschat-db Date: Fri, 14 Feb 2025 13:01:10 +0100 Subject: [PATCH 4/6] Address comments --- protocol_rfcs/checkpoint-protection.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/protocol_rfcs/checkpoint-protection.md b/protocol_rfcs/checkpoint-protection.md index 228ddf90661..d8d2b388137 100644 --- a/protocol_rfcs/checkpoint-protection.md +++ b/protocol_rfcs/checkpoint-protection.md @@ -16,25 +16,24 @@ With these changes, we can drop table features in a single command without needi -------- - -> ***New Section*** +> ***Add a new section at the [Table Features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#table-features) section*** # Checkpoint Protection -The `checkpointProtection` is a Writer feature that allows writers to clean up metadata iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. +The `checkpointProtection` is a Writer feature that allows writers to clean up metadata if and only if metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. Enablement: -- The table must be on Writer Version 7 and Reader Version 1. +- The table must be at least on Writer Version 7 and Reader Version 1. - The feature `checkpointProtection` must exist in the table `protocol`'s `writerFeatures`. ## Writer Requirements for Checkpoint Protection -For tables with `checkpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed iff metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. +For tables with `checkpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed if and only if metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. There are two exceptions to this rule. If any of the two holds, the rule above can be ignored: a) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version. -b) The writer verifies it supports all protocols between `[start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)]`. +b) The writer verifies it supports all protocols in the closed range `[start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)]`. The `checkpointProtection` feature can only be removed if history is truncated up to at least the `requireCheckpointProtectionBeforeVersion`. From 928aeb03ae1999b5170b261dfbbc834957a81908 Mon Sep 17 00:00:00 2001 From: andreaschat-db Date: Fri, 14 Feb 2025 17:54:43 +0100 Subject: [PATCH 5/6] Address comments --- protocol_rfcs/checkpoint-protection.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/protocol_rfcs/checkpoint-protection.md b/protocol_rfcs/checkpoint-protection.md index d8d2b388137..aa8b9d4cb99 100644 --- a/protocol_rfcs/checkpoint-protection.md +++ b/protocol_rfcs/checkpoint-protection.md @@ -6,7 +6,8 @@ The motivation is to improve the drop feature functionality. Today, dropping a f We can improve this process by introducing `checkpointProtection`, which allows us to set up the table's history (including checkpoints) in such a way that older readers will be able to handle it correctly until we atomically delete it. -A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported log records behind them. With the `checkpointProtection`, we can guarantee these checkpoints will persist until history is truncated. +A key component of this solution is a special set of protected checkpoints at the DROP FEATURE boundary that are guaranteed to persist until all history is truncated up to the checkpoints in one go. These checkpoints act as barriers that hide unsupported commit +records behind them. With the `checkpointProtection`, we can guarantee these checkpoints will persist until history is truncated. Furthermore, with the new drop feature method, validating against the latest protocol is no longer sufficient. Therefore, creating checkpoints to historical versions can lead to corruption if the writer does not support the target protocol. The `checkpointProtection` also protects against these cases by disallowing checkpoint creation before `requireCheckpointProtectionBeforeVersion`. @@ -27,13 +28,13 @@ Enablement: ## Writer Requirements for Checkpoint Protection -For tables with `checkpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed if and only if metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. +For tables with `checkpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed if and only if metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. Furthermore, before removing checkpoints, all associated commits need to be removed first. This operation should have the same atomicity guarantees (if any) as with the regular metadata cleanup operation. -There are two exceptions to this rule. If any of the two holds, the rule above can be ignored: +We can allow history truncation at an earlier commit, as long as checkpoints are removed together with the associated commits, if any of the two following exceptions hold: a) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version. -b) The writer verifies it supports all protocols in the closed range `[start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)]`. +b) The writer verifies it supports all protocols in the closed range `[start, min(requireCheckpointProtectionBeforeVersion, targetCleanupVersion)]` (assuming a single checkpoint is created at `targetCleanupVersion`). The `checkpointProtection` feature can only be removed if history is truncated up to at least the `requireCheckpointProtectionBeforeVersion`. From 758e586970ad5437d5514bfb0c21bb365d49b630 Mon Sep 17 00:00:00 2001 From: andreaschat-db Date: Fri, 14 Feb 2025 18:01:11 +0100 Subject: [PATCH 6/6] nit --- protocol_rfcs/checkpoint-protection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/protocol_rfcs/checkpoint-protection.md b/protocol_rfcs/checkpoint-protection.md index aa8b9d4cb99..3448ff4fe15 100644 --- a/protocol_rfcs/checkpoint-protection.md +++ b/protocol_rfcs/checkpoint-protection.md @@ -30,7 +30,7 @@ Enablement: For tables with `checkpointProtection` supported in the protocol, writers need to check `requireCheckpointProtectionBeforeVersion` before cleaning up metadata. Metadata clean up can proceed if and only if metadata can be cleaned up to the `requireCheckpointProtectionBeforeVersion` table property in one go. This means that a single cleanup operation should truncate up to `requireCheckpointProtectionBeforeVersion` as opposed to several cleanup operations truncating in chunks. Furthermore, before removing checkpoints, all associated commits need to be removed first. This operation should have the same atomicity guarantees (if any) as with the regular metadata cleanup operation. -We can allow history truncation at an earlier commit, as long as checkpoints are removed together with the associated commits, if any of the two following exceptions hold: +We can allow history truncation at an earlier commit, as long as checkpoints are removed together with the associated commits, and if any of the two following exceptions hold: a) The writer does not create any checkpoints during history cleanup and does not erase any checkpoints after the truncation version.