Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11658. Skip known tombstones when scanning rocksdb deletedtable in KeyDeletingService #7436

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

guohao-rosicky
Copy link
Contributor

@guohao-rosicky guohao-rosicky commented Nov 15, 2024

What changes were proposed in this pull request?

KeyDeletingService, iterate several records from scratch each time, send deletion requests, and then delete these records.

If each iteration starts from scratch, Rocksdb will also include the recently deleted tombstone data in the scan.

The optimization is to optimize the iteration strategy, record breakpoints for each iteration, skip known tombstones, and reset breakpoints if errors occur during the iteration or if the endpoint is reached.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11658

How was this patch tested?

TestKeyDeletingService set OZONE_KEY_DELETING_LIMIT_PER_TASK is 10

@adoroszlai adoroszlai marked this pull request as draft November 15, 2024 07:36
@adoroszlai
Copy link
Contributor

Thanks @guohao-rosicky for the patch.

Please wait for clean CI run in fork before opening PR.

There are some failures that seem related:

Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 31.757 s <<< FAILURE! - in org.apache.hadoop.ozone.om.service.TestKeyDeletingService$Normal

https://github.com/guohao-rosicky/ozone/actions/runs/11851572260/job/33028663632#step:6:2342

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.414 s <<< FAILURE! - in org.apache.hadoop.ozone.om.snapshot.TestSnapshotDeletingServiceIntegrationTest

https://github.com/guohao-rosicky/ozone/actions/runs/11851572260/job/33028664194#step:6:2895

@sadanand48
Copy link
Contributor

If each iteration starts from scratch, Rocksdb will also include the recently deleted tombstone data in the scan.

Are we sure about this? Doesn't rocksdb abstract out the tombstone info? I think it does. The Java API for iterating the table should not get the tombstones.

@sadanand48
Copy link
Contributor

Also recording the last scan key will cause it to skip the newer entries which alphabetically (sorted order) appear before the last scan key causing many keys to remain undeleted.

@errose28
Copy link
Contributor

Also recording the last scan key will cause it to skip the newer entries which alphabetically (sorted order) appear before the last scan key causing many keys to remain undeleted.

+1. This also adds coupling to internal RocksDB behavior that may be optimized or irrelevant in later releases. Since the change is not without drawbacks it would be good to run a micro benchmark to quantify the speedup. A simple junit test that generates large amounts of delete entries and times how long the new vs old implementation takes to clear them out would probably suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants