Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR] pass the avro exception for better information #11925

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

prabodh1194
Copy link
Contributor

@prabodh1194 prabodh1194 commented Sep 11, 2024

Change Logs

In a few cases, Avro exception is pretty good. Highlighting the error message instead of suppressing it is more informational.

e.g.: In this spark connect example, there is a duplicate column error. But the error is visible on the driver. Not on the python client. This PR will bubble the error message proactively and improve the UX.

from pyspark.sql.session import SparkSession

sp = (
    SparkSession.builder.appName("Hudi").remote("sc://0.0.0.0").getOrCreate()
)

df_1 = sp.createDataFrame([
    (1, "foo"), (2, "bar"), (3, "baz")
], ["id", "name"])

df_2 = sp.createDataFrame([
    (1, "foo"), (2, "bar"), (3, "baz")
], ["id", "name"])

df = df_1.join(df_2, on="id")

df.write.format("hudi").option(
    "hoodie.table.name", "hudi_table"
).mode("overwrite").save("/tmp/hudi")

Error with this patch [more readable]

pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.hudi.internal.schema.HoodieSchemaException) 
Duplicate field name in record hoodie.hudi_table.hudi_table_record: name type:UNION pos:2 and name type:UNION pos:1.

Error without this patch

pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.hudi.internal.schema.HoodieSchemaException) 
Failed to convert struct type to avro schema: StructType(StructField(id,LongType,true),StructField(name,StringType,true),StructField(name,StringType,true))

Impact

NA

Risk level (write none, low medium or high below)

none

Documentation Update

na

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label Sep 11, 2024
@prabodh1194 prabodh1194 force-pushed the improve_avro_error_handling branch from 20f91a0 to 0b71db4 Compare September 11, 2024 17:54
@prabodh1194
Copy link
Contributor Author

i don't use hudi-1.0.0 yet, so raised PR against 0.15 only.

@github-actions github-actions bot added size:S PR with lines of changes in (10, 100] and removed size:XS PR with lines of changes in <= 10 labels Sep 11, 2024
@danny0405
Copy link
Contributor

Thanks for the contribution, can we make this change against master and then cherry pick to 0.x-branch if necessary. The 0.x-branch is the branch for new releases of 0.x Hudi.

@prabodh1194
Copy link
Contributor Author

sure

@prabodh1194 prabodh1194 force-pushed the improve_avro_error_handling branch from 0b71db4 to 10a79b7 Compare September 12, 2024 02:21
@prabodh1194 prabodh1194 changed the base branch from release-0.15.0 to master September 12, 2024 02:21
@prabodh1194 prabodh1194 force-pushed the improve_avro_error_handling branch from 10a79b7 to c1d8904 Compare September 12, 2024 02:23
@prabodh1194
Copy link
Contributor Author

prabodh1194 commented Sep 12, 2024

@danny0405 I have addressed your comments now. thank you.

@prabodh1194 prabodh1194 changed the title pass the avro exception for better information [MINOR] pass the avro exception for better information Sep 12, 2024
@danny0405
Copy link
Contributor

Thanks, I have re-trigger the Azure CI tests against the failures.

@prabodh1194
Copy link
Contributor Author

@danny0405 do I need to do something to get this CI to pass?

@prabodh1194 prabodh1194 force-pushed the improve_avro_error_handling branch from c1d8904 to cba5757 Compare September 13, 2024 06:40
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@prabodh1194
Copy link
Contributor Author

@danny0405 all checks are green now 🎉

@wombatu-kun wombatu-kun merged commit 461e58b into apache:master Sep 18, 2024
43 checks passed
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Sep 19, 2024
@prabodh1194 prabodh1194 deleted the improve_avro_error_handling branch September 21, 2024 07:59
@prabodh1194 prabodh1194 restored the improve_avro_error_handling branch September 21, 2024 08:25
@prabodh1194
Copy link
Contributor Author

@wombatu-kun can you please merge this to https://github.com/apache/hudi/tree/release-0.15.0 as well.

#11981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants