[HUDI-8780][RFC-83] Incremental Table Service #12601

zhangyue19921010 · 2025-01-08T10:06:05Z

Change Logs

In Hudi, when scheduling Compaction and Clustering, the default behavior is to scan all partitions under the current table. When there are many historical partitions, such as 640,000 in our production environment, this scanning and planning operation becomes very inefficient. For Flink, it often leads to checkpoint timeouts, resulting in data delays.
As for cleaning, we already have the ability to do cleaning for incremental partitions.

This RFC will draw on the design of Incremental Clean to generalize the capability of processing incremental partitions to all table services, such as Clustering and Compaction.

Impact

compaction and clustering

Risk level (write none, low medium or high below)

medium

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

TheR1sing3un · 2025-01-09T04:28:40Z

@zhangyue19921010 Hi, judging from the rfc content, the goal this time is to use incremental compaction and clustering. Should we consider incorporating the incremental processing logic for clean as well? By the way, whether we can solve this problem: #11647 , it seems that we can solve it at the time of implementation.

zhangyue19921010 · 2025-01-10T02:13:00Z

Hi @TheR1sing3un ， Thanks for your attention.

@zhangyue19921010 Hi, judging from the rfc content, the goal this time is to use incremental compaction and clustering. Should we consider incorporating the incremental processing logic for clean as well?

Sure thing we can build a unified incremental policy. But as we know, clean action is not a Strategy-Coding-Style-Action, even though we have many different cleaning strategies(clean by commits or clean by versions). So that we may need to reconstruct the clean plan phase and abstract it into different strategy objects. We can do it in the next PR.

By the way, whether we can solve this problem: #11647 , it seems that we can solve it at the time of implementation.

Unfortunately, this PR shouldn't solve the problem. This PR solves the problem of incremental processing, that is, how to process only incremental partitions next time after the last table service is completed. #11647 focuses on how to trigger the first action in an elegant way

danny0405 · 2025-01-13T06:14:09Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

+      .defaultValue(true)
+      .markAdvanced()
+      .sinceVersion("1.0.0")
+      .withDocumentation("Whether to enable incremental table service.");


Can we elaborate the scope of the kinds of table services that are affected by this option.

enable -> enabled.

danny0405 · 2025-01-13T06:16:50Z

...nt-common/src/main/java/org/apache/hudi/table/action/BaseTableServicePlanActionExecutor.java

+        // get incremental partitions.
+        LOG.info("Start to fetch incremental partitions for " + type);
+        Set<String> incrementalPartitions = getIncrementalPartitions(type);
+        if (!incrementalPartitions.isEmpty()) {


Why fall back to full partitions if the incremental partition list is empty?

changed.
For now, there are two situations fall back to get all partitions

incremental partitions getting throws exceptions.

last completed table service instant is archived.

AND
when incremental partition list is empty , we will skip current schedule.

danny0405 · 2025-01-13T06:23:56Z

...nt-common/src/main/java/org/apache/hudi/table/action/BaseTableServicePlanActionExecutor.java

+    Pair<Option<HoodieInstant>, List<String>> missingPair = fetchMissingPartitions(type);
+    if (!missingPair.getLeft().isPresent()) {
+      // Last complete table service commit maybe archived.
+      return Collections.emptySet();


How do we handle the case the writer just commits the empty commits continuously? I think we still should trigger incremental partiton fetching. If the incremental partition list is empty we can just skip the scheduling?

make senses. changed
For now, there are two situations fall back to get all partitions

incremental partitions getting throws exceptions.
last completed table service instant is archived.
AND
when incremental partition list is empty , we will skip current schedule.

danny0405 · 2025-01-13T06:27:48Z

...ent-common/src/main/java/org/apache/hudi/table/action/IncrementalPartitionAwareStrategy.java

+import java.util.List;
+
+/**
+ * Marking strategy interface.


Marking interface for table service srategy that utilitize incremental partitions.

danny0405 · 2025-01-13T06:32:07Z

...ent-common/src/main/java/org/apache/hudi/table/action/IncrementalPartitionAwareStrategy.java

+ * Marking strategy interface.
+ *
+ * Any Strategy implement this `IncrementalPartitionAwareStrategy` could have the ability to perform incremental partitions processing.
+ * At this time, Incremental partitions should be passed to the current strategy.


<p> Any strategy class that implements this `IncrementalPartitionAwareStrategy` could have the ability to perform incremental partitions processing. Currently, Incremental partitions will be passed to the strategy instance as a best-effort. In the following cases, the partitions would fallback to full partition list: <ul> <li></li> ... </ul>

changed.

/** * Marking interface for table service strategy that utilize incremental partitions. * * <p> Any strategy class that implements this `IncrementalPartitionAwareStrategy` could have the ability to perform incremental partitions processing. * Currently, Incremental partitions will be passed to the strategy instance as a best-effort. In the following cases, the partitions would fallback to full partition list: * * <ul> * <li> Executing Table Service for the first time. </li> * <li> The last completed table service instant is archived. </li> * <li> Any exception thrown during retrieval of incremental partitions. </li> * </ul> */

danny0405 · 2025-01-13T06:32:23Z

...ent-common/src/main/java/org/apache/hudi/table/action/IncrementalPartitionAwareStrategy.java

+  /**
+   * Filter the given incremental partitions.
+   * @param writeConfig
+   * @param incrementalPartitions


danny0405 · 2025-01-13T06:35:46Z

...common/src/main/java/org/apache/hudi/table/action/cluster/ClusteringPlanPartitionFilter.java

@@ -42,13 +42,13 @@
 */
 public class ClusteringPlanPartitionFilter {

-  public static List<String> filter(List<String> partitions, HoodieWriteConfig config) {
+  public static List<String> filter(List<String> partitions, HoodieWriteConfig config, ArrayList<String> missingPartitions) {


Maybe we return a Pair for both missing partitions and filtered partitions.

make sense.
changed.

danny0405 · 2025-01-13T06:36:52Z

...he/hudi/table/action/cluster/strategy/BaseConsistentHashingBucketClusteringPlanStrategy.java

@@ -128,7 +129,7 @@ protected Stream<HoodieClusteringGroup> buildClusteringGroupsForPartition(String
            .build();
      }).collect(Collectors.toList()));
    }
-    return ret.stream();
+    return Pair.of(ret.stream(), true);


In which case the flag is false.

This boolean value is used to indicate whether all candidate fileslice under the current partition have been processed. The external world uses this as a basis to determine whether the current partition is added to missingPartitions.

For BaseConsistentHashingBucketClusteringPlanStrategy it always return true.

For other PartitionAwareClusteringPlanStrategy like SparkSizeBasedClusteringPlanStrategy it may return false.
Such as , 100 fileSlices are passed in, but due to the limitation of writeConfig.getClusteringMaxNumGroups(), only 10 of them are processed. In this case, false should be returned.

danny0405 · 2025-01-13T06:41:15Z

...java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java

@@ -55,7 +59,7 @@ public PartitionAwareClusteringPlanStrategy(HoodieTable table, HoodieEngineConte
  /**
   * Create Clustering group based on files eligible for clustering in the partition.
   */
-  protected Stream<HoodieClusteringGroup> buildClusteringGroupsForPartition(String partitionPath, List<FileSlice> fileSlices) {
+  protected Pair<Stream<HoodieClusteringGroup>, Boolean> buildClusteringGroupsForPartition(String partitionPath, List<FileSlice> fileSlices) {


Can we add some doc on the boolean flag.

added.

/** * Create Clustering group based on files eligible for clustering in the partition. * return stream of HoodieClusteringGroup and boolean partial Scheduled indicating whether all given fileSlices in the current partition have been processed. * For example, if some file slices will not be processed due to writeConfig.getClusteringMaxNumGroups(), then return false */

danny0405 · 2025-01-13T06:47:27Z

...java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java

@@ -68,6 +72,7 @@ protected Stream<HoodieClusteringGroup> buildClusteringGroupsForPartition(String
            - (o1.getBaseFile().isPresent() ? o1.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize())));

    long totalSizeSoFar = 0;
+    boolean isAllSlicesIncluded = true;


isAllSlicesIncluded -> partitialScheduled?

danny0405 · 2025-01-13T06:51:08Z

.../org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java

-  protected abstract HoodieCompactionPlan getCompactionPlan(HoodieTableMetaClient metaClient, List<HoodieCompactionOperation> operations);
+  protected abstract List<String> getPartitions();
+
+  protected abstract HoodieCompactionPlan getCompactionPlan(HoodieTableMetaClient metaClient, List<HoodieCompactionOperation> operations, Pair<List<String>,List<String>> partitionPair);


Do we need the missing partitions to generate the plan?

because we need to record this missing partitions(.setMissingSchedulePartitions(res)) in plan during getCompactionPlan function

danny0405 · 2025-01-13T06:53:34Z

...src/main/java/org/apache/hudi/table/action/compact/strategy/BoundedIOCompactionStrategy.java

@@ -46,7 +47,7 @@ public List<HoodieCompactionOperation> orderAndFilter(HoodieWriteConfig writeCon
      targetIORemaining -= opIo;
      finalOperations.add(op);
      if (targetIORemaining <= 0) {
-        return finalOperations;
+        missingPartitions.add(op.getPartitionPath());


Usually we do not modify the passed in collections.

Can we add some doc around the return values of #orderAndFilter

danny0405 · 2025-01-13T06:55:24Z

...t-common/src/main/java/org/apache/hudi/table/action/compact/strategy/CompactionStrategy.java

    // Strategy implementation can overload this method to set specific compactor-id
+    Set<String> missingPartitions = new HashSet<>(partitionPair.getRight());
+    List<HoodieCompactionOperation> operationsToProcess = orderAndFilter(writeConfig, operations, pendingCompactionPlans, missingPartitions);
+    List<String> res = writeConfig.isIncrementalTableServiceEnable() ? new ArrayList<>(missingPartitions) : new ArrayList<>();


Maybe null as the default value to avoid deserialization.

zhangyue19921010 · 2025-01-15T04:36:57Z

Hi @danny0405 Thanks for your review. All comments are addressed.
Also Add several necessary UTs related. PTAL
Removed WIP.

danny0405 · 2025-01-15T04:46:58Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

@@ -808,6 +808,13 @@ public class HoodieWriteConfig extends HoodieConfig {
      .withDocumentation("Avro schema of the partial updates. This is automatically set by the "
          + "Hudi write client and user is not expected to manually change the value.");

+  public static final ConfigProperty<Boolean> INCREMENTAL_TABLE_SERVICE_ENABLE = ConfigProperty
+      .key("hoodie.incremental.tableservice.enabled")


Does this option exist for some corner cases, we should always try incremental scheduling I think.

Incremental processing is used everywhere including UT. There is a switch here (default True) just in case. In some scenarios, users need full processing. Users can manually turn off the logic of incremental processing (similar to the switch between MDT and clean). )

danny0405 · 2025-01-15T05:48:05Z

...nt-common/src/main/java/org/apache/hudi/table/action/BaseTableServicePlanActionExecutor.java

+            if (completionTime.compareTo(leftBoundary) >= 0 && completionTime.compareTo(rightBoundary) < 0) {
+              HoodieCommitMetadata metadata = TimelineUtils.getCommitMetadata(instant, activeTimeline);
+              // ignore all the clustering operation for both mor and cow table
+              if (!metadata.getOperationType().equals(WriteOperationType.CLUSTER)) {


why ignore cluster? Maybe we should ingore based on the passed in table service type.

No need to ignore actually. Maybe filterCommitByTableType is good enough.

danny0405 · 2025-01-15T05:53:22Z

...nt-common/src/main/java/org/apache/hudi/table/action/BaseTableServicePlanActionExecutor.java

+
+  public Pair<Option<HoodieInstant>, List<String>> fetchMissingPartitions(TableServiceType tableServiceType) {
+    if (!config.isIncrementalTableServiceEnable()) {
+      return Pair.of(Option.empty(), new ArrayList<>());


Collections.emptyList?

danny0405 · 2025-01-15T05:56:14Z

...java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java

@@ -68,6 +75,7 @@ protected Stream<HoodieClusteringGroup> buildClusteringGroupsForPartition(String
            - (o1.getBaseFile().isPresent() ? o1.getBaseFile().get().getFileSize() : writeConfig.getParquetMaxFileSize())));

    long totalSizeSoFar = 0;
+    boolean partialScheduled = true;


The default value should be false I guess.

exactly Missing here. changed

danny0405 · 2025-01-15T05:57:21Z

...java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java

+      List<FileSlice> fileSlicesEligible = getFileSlicesEligibleForClustering(partitionPath).collect(Collectors.toList());
+      Pair<Stream<HoodieClusteringGroup>, Boolean> groupPair = buildClusteringGroupsForPartition(partitionPath, fileSlicesEligible);
+      List<HoodieClusteringGroup> clusteringGroupsPartition = groupPair.getLeft().collect(Collectors.toList());
+      Boolean allProcessed = groupPair.getRight();


partialScheduled?

changed. Sorry for missing this.

cshuo · 2025-01-15T02:28:45Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

@@ -808,6 +808,13 @@ public class HoodieWriteConfig extends HoodieConfig {
      .withDocumentation("Avro schema of the partial updates. This is automatically set by the "
          + "Hudi write client and user is not expected to manually change the value.");

+  public static final ConfigProperty<Boolean> INCREMENTAL_TABLE_SERVICE_ENABLE = ConfigProperty
+      .key("hoodie.incremental.tableservice.enabled")


tableservice -> table.service to keep consistent with other table service conf?

changed to hoodie.table.services.incremental.enabled

cshuo · 2025-01-15T06:54:34Z

...nt-common/src/main/java/org/apache/hudi/table/action/BaseTableServicePlanActionExecutor.java

+    switch (table.getMetaClient().getTableType()) {
+      case MERGE_ON_READ: {
+        // for mor only take cares of delta commit and replace commit
+        Set<String> operations = CollectionUtils.createSet(DELTA_COMMIT_ACTION, REPLACE_COMMIT_ACTION);


better not create constant set per instant

changed. Thanks

danny0405 · 2025-01-16T03:27:34Z

...ource/hudi-spark/src/test/java/org/apache/hudi/table/action/compact/TestHoodieCompactor.java

+      List<String> affectedPartitions2 = compactionPlan2.getOperations().stream()
+          .map(HoodieCompactionOperation::getPartitionPath).collect(Collectors.toList());
+      // compaction including 20250115 (fetched from recorded missing partitions)
+      assertTrue(affectedPartitions2.contains(partitions[0]));


do we test the fallback for archived commits?

zhangyue19921010 · 2025-01-17T07:27:59Z

...lient-common/src/test/java/org/apache/hudi/table/TestBaseTableServicePlanActionExecutor.java

+  }
+
+  @Test
+  public void testGetPartitionsFallbackToFullScan() throws Exception {


@danny0405 This UT is test fall back to full scan when table service instant is archived.

zhangyue19921010 · 2025-01-17T07:28:23Z

...lient-common/src/test/java/org/apache/hudi/table/TestBaseTableServicePlanActionExecutor.java

+  }
+
+  @Test
+  public void testContinuousEmptyCommits() throws Exception {


testContinuousEmptyCommits

danny0405 · 2025-01-18T02:27:49Z

...nt-common/src/main/java/org/apache/hudi/table/action/BaseTableServicePlanActionExecutor.java

@@ -116,7 +116,7 @@ public Pair<Option<HoodieInstant>, Set<String>> getIncrementalPartitions(TableSe
    String rightBoundary = instantTime;
    // compute [leftBoundary, rightBoundary) as time window
    HoodieActiveTimeline activeTimeline = table.getActiveTimeline();
-    Set<String> partitionsInCommitMeta = table.getActiveTimeline().filterCompletedInstants().getCommitsTimeline().getInstantsAsStream()
+    Set<String> partitionsInCommitMeta = table.getActiveTimeline().findInstantsAfter(leftBoundary).filterCompletedInstants().getCommitsTimeline().getInstantsAsStream()


why filtering the instant time by leftBoundary?

@danny0405 Nice catch Danny.
At first I just wanted to filter out the left boundary instant itself, but obviously I called the API findInstantsAfter by mistake. Here we should get and filter the entire active time line complete instants

finsh base conding && need more test

8527055

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jan 8, 2025

fix checkstyle

cd9021a

fix ut

e9fe1ef

zhangyue19921010 added 4 commits January 10, 2025 18:49

add clustering related UTs

4d8ede8

fix checkstyle

725c826

fix checkstyle

06709ff

fix checkstyle

e109fa0

danny0405 reviewed Jan 13, 2025

View reviewed changes

finish Clustering related coding and tests

0e828b6

github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Jan 13, 2025

zhangyue19921010 added 4 commits January 13, 2025 18:14

fix ut

8132af0

code review

2dbefd3

code review and need more compaction test

2a3cc5e

finish code review && finish compaction related uts

be44d03

fix checkstyle

c90e050

zhangyue19921010 changed the title ~~[HUDI-8780][RFC-83][WIP] Incremental Table Service~~ [HUDI-8780][RFC-83] Incremental Table Service Jan 15, 2025

fix ut

29675ff

zhangyue19921010 requested a review from danny0405 January 15, 2025 04:39

danny0405 reviewed Jan 15, 2025

View reviewed changes

zhangyue19921010 added 2 commits January 15, 2025 14:42

fix ut

47ad95e

code review and fix ut

50d6a13

cshuo reviewed Jan 15, 2025

View reviewed changes

code review and fix ut

6262f4b

zhangyue19921010 requested a review from danny0405 January 16, 2025 02:17

danny0405 reviewed Jan 16, 2025

View reviewed changes

fix ut and code review

16143bc

zhangyue19921010 commented Jan 17, 2025

View reviewed changes

disbale incremntal compaction for MDT

0be5d5e

danny0405 reviewed Jan 18, 2025

View reviewed changes

fix ut

a2ba669

[HUDI-8780][RFC-83] Incremental Table Service #12601

Are you sure you want to change the base?

[HUDI-8780][RFC-83] Incremental Table Service #12601

Conversation

zhangyue19921010 commented Jan 8, 2025

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

TheR1sing3un commented Jan 9, 2025

zhangyue19921010 commented Jan 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangyue19921010 Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangyue19921010 Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangyue19921010 commented Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danny0405 Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

zhangyue19921010 Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangyue19921010 Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

zhangyue19921010 Jan 15, 2025 •

edited

Loading

zhangyue19921010 Jan 15, 2025 •

edited

Loading

zhangyue19921010 commented Jan 15, 2025 •

edited

Loading

danny0405 Jan 15, 2025 •

edited

Loading

zhangyue19921010 Jan 16, 2025 •

edited

Loading

zhangyue19921010 Jan 20, 2025 •

edited

Loading