HDDS-11658. Skip known tombstones when scanning rocksdb deletedtable in KeyDeletingService #7436

guohao-rosicky · 2024-11-15T07:07:16Z

What changes were proposed in this pull request?

KeyDeletingService, iterate several records from scratch each time, send deletion requests, and then delete these records.

If each iteration starts from scratch, Rocksdb will also include the recently deleted tombstone data in the scan.

The optimization is to optimize the iteration strategy, record breakpoints for each iteration, skip known tombstones, and reset breakpoints if errors occur during the iteration or if the endpoint is reached.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11658

How was this patch tested?

TestKeyDeletingService set OZONE_KEY_DELETING_LIMIT_PER_TASK is 10

…in KeyDeletingService

adoroszlai · 2024-11-15T07:38:32Z

Thanks @guohao-rosicky for the patch.

Please wait for clean CI run in fork before opening PR.

There are some failures that seem related:

Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 31.757 s <<< FAILURE! - in org.apache.hadoop.ozone.om.service.TestKeyDeletingService$Normal

https://github.com/guohao-rosicky/ozone/actions/runs/11851572260/job/33028663632#step:6:2342

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.414 s <<< FAILURE! - in org.apache.hadoop.ozone.om.snapshot.TestSnapshotDeletingServiceIntegrationTest

https://github.com/guohao-rosicky/ozone/actions/runs/11851572260/job/33028664194#step:6:2895

sadanand48 · 2024-11-15T07:52:28Z

If each iteration starts from scratch, Rocksdb will also include the recently deleted tombstone data in the scan.

Are we sure about this? Doesn't rocksdb abstract out the tombstone info? I think it does. The Java API for iterating the table should not get the tombstones.

sadanand48 · 2024-11-15T07:55:13Z

Also recording the last scan key will cause it to skip the newer entries which alphabetically (sorted order) appear before the last scan key causing many keys to remain undeleted.

errose28 · 2024-11-15T17:35:44Z

Also recording the last scan key will cause it to skip the newer entries which alphabetically (sorted order) appear before the last scan key causing many keys to remain undeleted.

+1. This also adds coupling to internal RocksDB behavior that may be optimized or irrelevant in later releases. Since the change is not without drawbacks it would be good to run a micro benchmark to quantify the speedup. A simple junit test that generates large amounts of delete entries and times how long the new vs old implementation takes to clear them out would probably suffice.

HDDS-11658. Skip known tombstones when scanning rocksdb deletedtable …

43328f4

…in KeyDeletingService

guohao-rosicky requested a review from xichen01 November 15, 2024 07:12

adoroszlai marked this pull request as draft November 15, 2024 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-11658. Skip known tombstones when scanning rocksdb deletedtable in KeyDeletingService #7436

HDDS-11658. Skip known tombstones when scanning rocksdb deletedtable in KeyDeletingService #7436

guohao-rosicky commented Nov 15, 2024 •

edited

Loading

adoroszlai commented Nov 15, 2024

sadanand48 commented Nov 15, 2024

sadanand48 commented Nov 15, 2024

errose28 commented Nov 15, 2024

HDDS-11658. Skip known tombstones when scanning rocksdb deletedtable in KeyDeletingService #7436

Are you sure you want to change the base?

HDDS-11658. Skip known tombstones when scanning rocksdb deletedtable in KeyDeletingService #7436

Conversation

guohao-rosicky commented Nov 15, 2024 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

adoroszlai commented Nov 15, 2024

sadanand48 commented Nov 15, 2024

sadanand48 commented Nov 15, 2024

errose28 commented Nov 15, 2024

guohao-rosicky commented Nov 15, 2024 •

edited

Loading