-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Avoid recursive external error wrapping #14371
Conversation
This reverts commit 127705e.
INSERT INTO tab0 VALUES(83,0,38); | ||
|
||
query error DataFusion error: Arrow error: Divide by zero error | ||
SELECT DISTINCT - 84 FROM tab0 AS cor0 WHERE NOT + 96 / + col1 <= NULL GROUP BY col1, col0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @getChan !
datafusion/common/src/error.rs
Outdated
@@ -131,6 +131,10 @@ pub enum DataFusionError { | |||
/// Errors from either mapping LogicalPlans to/from Substrait plans | |||
/// or serializing/deserializing protobytes to Substrait plans | |||
Substrait(String), | |||
|
|||
/// Errors for wrapping other errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed (compared to Context
for example?) I don't understand why we need to keep the data wrappeed (why don't we just pass the underlying error back?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if we only pass the underlying error, we can't share the error (due to ownership constraints). For example, RepartitionExec
needs to send errors to all output partitions. You can see this in the source code.
datafusion/datafusion/physical-plan/src/repartition/mod.rs
Lines 914 to 921 in 8d006a2
Ok(Err(e)) => { | |
let e = Arc::new(e); | |
for (_, tx) in txs { | |
// wrap it because need to send error to all output partitions | |
let err = Err(DataFusionError::External(Box::new(Arc::clone(&e)))); | |
tx.send(Some(err)).await.ok(); | |
} |
To share the error, I considered the following.
Implement Clone for DataFusionError
and useerr.clone()
: This is not possible because nested Error types do not implement theClone
trait.- Pass
Arc<DatafusionError>
: I implemented it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using Context
, the underlying error is only passed as a single Result
due to ownership constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to SharedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @getChan -- As DataFusionError is used all over the place and is a very visible part of the API I was thinking it would be best if we can avoid adding too many variants. However, after playing with some other options (like deep cloning the error) I agree that this solution is the best.
to be consistent with the other variants which don't have a Error
in their name, I renamed this to DataFusionError::Shared
# Conflicts: # datafusion/common/src/error.rs
88233aa
to
d4e860f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @getChan -- this is a nice usability improvement and I think it is ready to go. It was very nicely done and much appreciated.
I took the liberty of merging this PR up from main and updating some documentation so this is ready to merge.
Thanks again!
datafusion/common/src/error.rs
Outdated
@@ -131,6 +131,10 @@ pub enum DataFusionError { | |||
/// Errors from either mapping LogicalPlans to/from Substrait plans | |||
/// or serializing/deserializing protobytes to Substrait plans | |||
Substrait(String), | |||
|
|||
/// Errors for wrapping other errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @getChan -- As DataFusionError is used all over the place and is a very visible part of the API I was thinking it would be best if we can avoid adding too many variants. However, after playing with some other options (like deep cloning the error) I agree that this solution is the best.
to be consistent with the other variants which don't have a Error
in their name, I renamed this to DataFusionError::Shared
I double checked and we will have to update some expected error messages to get the extended tests passing too. I made a PR here: |
Marking as an API change as this will result in downstream users having to handle a new DataFusionError variants (which is fine, I just want to correctly capture the idea) |
Thanks again @getChan |
The extended tests are going to fail on main until we merge apache/datafusion-testing#6 and update the testing pin |
|
Which issue does this PR close?
Rationale for this change
cause of the issue
In
RepartitionExec
, it was necessary to wrap theDatafusionError
to send it across multiple partitions. However due to the absence of an appropriate error clone method,ExternalError
was used. This resulted in a recursiveExternalError
.datafusion/datafusion/physical-plan/src/repartition/mod.rs
Lines 918 to 920 in 8d006a2
What changes are included in this PR?
DatafusionError
variantSharedError
for sharing sourceDatafusionError
Are these changes tested?
yes
Are there any user-facing changes?
error message will be changed.