-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8817] fix backward compatibility issue #12615
Conversation
256064d
to
640ffe5
Compare
a64c849
to
944075b
Compare
@@ -1271,10 +1272,24 @@ public TableBuilder fromProperties(Properties properties) { | |||
setTableName(hoodieConfig.getString(HoodieTableConfig.NAME)); | |||
} | |||
|
|||
if (hoodieConfig.contains(VERSION)) { | |||
if (hoodieConfig.contains(VERSION) && hoodieConfig.contains(WRITE_TABLE_VERSION)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in meta client should not consider write table version. Otherwise, there is no point checking whether hoodie.write.table.version
and hoodie.table.version
match.
CommonClientUtils#validateTableVersion
has the validation. Does Spark SQL writer escape that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, SQL writer does not call this function at all for create table stmt
944075b
to
c887fde
Compare
c887fde
to
9bd1a1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danny0405 : can you ensure we are covering all meta client instantiation flows in flink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I know, we can;t let users directly directly dictate the table version.
based on our migration guide, they can only configure "hoodie.write.auto.upgrade"
ref: https://hudi.apache.org/docs/deployment#upgrading-to-100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking on clarification
we discussed offline, let me know any AIs I should follow or please resume the review. Thanks |
unblocking for now.
We need to validate and may be fix all writers (spark-ds, spark-sql, deltastreamer and spark structured streaming ingestion) |
Deltastreamer : hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java Line 421 in ef5b9ca
Line 428 in ef5b9ca
spark-sql and spark-ds: guess this patch already have tests. spark-streaming: Line 208 in ef5b9ca
|
@@ -302,6 +303,7 @@ class HoodieSparkSqlWriterInternal { | |||
else KeyGeneratorType.getKeyGeneratorClassName(hoodieConfig) | |||
HoodieTableMetaClient.newTableBuilder() | |||
.setTableType(tableType) | |||
.setTableVersion(Integer.valueOf(getStringWithAltKeys(parameters, HoodieWriteConfig.WRITE_TABLE_VERSION))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't table version and write table version two different config options, not sure why setting up the table version always from the write table version, the write table version is auxiliary for migration, I kind of think we should not set it up explicitly every time we initiate the table. can we detect the table version automically from the table config
and validate against the write table version then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clarify the coordination of table version of write table version, confusing to me.
created jira 1.0.2 https://issues.apache.org/jira/browse/HUDI-8914 |
There are 3 configs at interplay:
During upgrade, the initial phase for writers should be configured with For creating new tables using 1.0, by default i.e. when none of the writer configs are set, For creating new tables using 1.0, if there is a need to create one with older format, then writers should be configured with |
So it looks like there is even no need to explicitly set up the table version when creating new table with 1.0 code, just hard code it as 8 should be fine. |
By default, we don't need to. Only when users want to write in older format with 1.0, they need to explicitly set the writer table version and auto upgrade |
It means in hudi 1.0, if a user wants to create version 6 table, they would set the config above and the system should honor that configuration in metaClient.initTable. This is why this change enforces when metaClient is created, we set the table version to write version. This is what is missing without the change
With my change that explicitly set the write version, when user does not override the config, the version would be 8 automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are aligned here, I am going ahead for 1.0.1. Config clean up for (write table version and tbl version) and doc updates we can take it up as a follow up.
@danny0405 @yihua : if you have any follow ups that needs to go into 1.0.1, do speak up :)
Going ahead with the changes. We will have follow ups for any ask from danny
Change Logs
There are 3 configs at interplay:
table config
hoodie.table.version
- can be 6 or 8writer config
hoodie.write.table.version
- can be 6 or 8writer config
hoodie.write.auto.upgrade
- boolean flag to control transition during upgrade, avoiding unintended upgrades.hoodie.table.version
reflects the actual state of the table.hoodie.write.table.version
defines the intended state of data written by the writer.During upgrade, the initial phase for writers should be configured with
hoodie.write.auto.upgrade=false
andhoodie.write.table.version=6
until when users are ready to start writing in the new format (i.e. after all readers have been upgraded).For creating new tables using 1.0, by default i.e. when none of the writer configs are set, hoodie.table.version should be 8.
hoodie.write.table.version
is automatically inferred to be 8.With my change that explicitly set the write version, when user does not override the config, the version would be 8 automatically.
For creating new tables using 1.0, if there is a need to create one with older format, then writers should be configured with
hoodie.write.auto.upgrade=false
andhoodie.write.table.version=6
. In this case, hoodie.table.version should also be 6.It means in hudi 1.0, if a user wants to create version 6 table, they would set the config above and the system should honor that configuration in metaClient.initTable. This is why this change enforces when metaClient is created, we set the table version to write version. This is what is missing without the change
when we specify write table version as 6
the hoodie.properties said the version is 8.
It is because when building the meta client table version is not set according to the write table version. Found similar misses else where, fixed as well with test coverage.
Impact
hoodie.write.table.version will also contribute to hoodie.table.version.
Risk level (write none, low medium or high below)
none
Documentation Update
none
Contributor's checklist