Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach Diagnostic to "wrong number of arguments" error #14432

Open
Tracked by #14429
eliaperantoni opened this issue Feb 3, 2025 · 8 comments
Open
Tracked by #14429

Attach Diagnostic to "wrong number of arguments" error #14432

eliaperantoni opened this issue Feb 3, 2025 · 8 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@eliaperantoni
Copy link
Contributor

Is your feature request related to a problem or challenge?

For a query like:

SELECT sum(1, 2)

The only message that the end user of an application built atop of DataFusion sees is:

Error during planning: Execution error: User-defined coercion failed with Execution("SUM expects exactly one argument") No function matches the given name and argument types 'sum(Int64, Int64)'. You might need to add explicit type casts.
        Candidate functions:
        sum(UserDefined)

We want to provide a richer message that references and highlights locations in the original SQL query, and contextualises and helps the user understand the error. In the end, it would be possible to display errors in a fashion akin to what was enabled by #13664 for some errors:

See #14429 for more information.

Describe the solution you'd like

Attach a well crafted Diagnostic to the DataFusionError, building on top of the foundations laid in #13664. See #14429 for more information.

Describe alternatives you've considered

No response

Additional context

No response

@eliaperantoni eliaperantoni added the enhancement New feature or request label Feb 3, 2025
@eliaperantoni eliaperantoni changed the title Attach Diagnostic to "wrong number of arguments" error Attach Diagnostic to "wrong number of arguments" error Feb 3, 2025
@Chen-Yuan-Lai
Copy link
Contributor

take

@alamb
Copy link
Contributor

alamb commented Feb 4, 2025

I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cases/diagnostic.rs are well structured for extension.

@alamb alamb added the good first issue Good for newcomers label Feb 4, 2025
@eliaperantoni
Copy link
Contributor Author

Hey @Chen-Yuan-Lai how is it going with this ticket :) Can I help with anything?

@Chen-Yuan-Lai
Copy link
Contributor

Hi @eliaperantoni sorry for the long delay. I’ve been taking some time to familiarize myself with this issue. I found that the "wrong number of argument" error may occur in different places depending on what function is called. For example,

  1. select sum(1, 2); (TypeSignature::UserDefined) : error message is "Execution error: sum function requires 1 argument, got 2"
  2. select ascii('abc', 'def'); (TypeSignature::String or other) : error message is "Function 'ascii' expects 1 arguments but received 2"

Should I:

  1. Implement a function to check the number of arguments for all the TypeSignature variants, and attach corresponding Diagnostic to DataFusionError?
  2. Attach Diagnostic to DataFusionError in different places?

Another stupid question is how to test the queries above correctly because I always get "expected diagnostic" error even though I have attached Diagnostic

return plan_err!(
                "Function '{function_name}' expects {expected_length} arguments but received {length}"
            ).map_err(|err| {
                err.with_diagnostic(
                    Diagnostic::new_error(
                        format!("Function '{function_name}' expects {expected_length} arguments but received {length}"),
                        None,

                    )
                )
            });

Is something wrong with this or do I need to do something else?

These are problems I met in this issue, I am looking forward to some hints or ideas, thanks so mush!

@eliaperantoni
Copy link
Contributor Author

eliaperantoni commented Feb 14, 2025

Hey @Chen-Yuan-Lai, absolutely no problem! Was checking in to see if you needed any help 😊. I see, that's a bit unfortunate. It seems like the first message is produced by a function called take_function_args and the second by one called function_length_check.

I see that take_function_args is receiving a lot of attention lately #14525. Perhaps you should attach a Diagnostic in the body of that function, and maybe you could even replace that where function_length_check is used, so that you can remove it?

Then the problem would be to get the Span to take_function_args, but maybe you could make it take an optional function_call_site: Option<Span> argument and then pass it in wherever possible, but without requiring all callers of take_function_args to pass it, because maybe sometimes it's just being used for a quick assertion.

I always get "expected diagnostic" error even though I have attached Diagnostic

That probably has something to do with the implementation of DataFusionError::iter and DataFusionError::diagnostic: they might not be able to unwrap the Diagnostic from the layers that compose the error. Have you tried dbg!-ing the error that you get in the test, to see if it contains a DataFusionError::Diagnostic somewhere? Perhaps the place where an error is created, that you attach a Diagnostic to, is not the first one that's triggered for a bad query let's say.

@Chen-Yuan-Lai
Copy link
Contributor

@eliaperantoni thanks for the advice! I will try that as soon as possible

@Chen-Yuan-Lai
Copy link
Contributor

Chen-Yuan-Lai commented Feb 20, 2025

Hi @eliaperantoni I found that Diagnostic information may lost when the error is wrapped in multiple layers. For example, the "wrong number of argument" error of sum function is wrapped in three locations:

  1. take_function_args

    pub fn take_function_args<const N: usize, T>(
    function_name: &str,
    args: impl IntoIterator<Item = T>,
    ) -> Result<[T; N]> {
    let args = args.into_iter().collect::<Vec<_>>();
    args.try_into().map_err(|v: Vec<T>| {
    _exec_datafusion_err!(
    "{} function requires {} {}, got {}",
    function_name,
    N,
    if N == 1 { "argument" } else { "arguments" },
    v.len()
    )

  2. get_valid_types_with_scalar_udf

    fn get_valid_types_with_aggregate_udf(
    signature: &TypeSignature,
    current_types: &[DataType],
    func: &AggregateUDF,
    ) -> Result<Vec<Vec<DataType>>> {
    let valid_types = match signature {
    TypeSignature::UserDefined => match func.coerce_types(current_types) {
    Ok(coerced_types) => vec![coerced_types],
    Err(e) => {
    return exec_err!(
    "Function '{}' user-defined coercion failed with {:?}",

  3. get_type

    Expr::AggregateFunction(AggregateFunction {
    func,
    params: AggregateFunctionParams { args, .. },
    }) => {
    let data_types = args
    .iter()
    .map(|e| e.get_type(schema))
    .collect::<Result<Vec<_>>>()?;
    let new_types = data_types_with_aggregate_udf(&data_types, func)
    .map_err(|err| {
    plan_datafusion_err!(
    "{} {}",
    match err {
    DataFusionError::Plan(msg) => msg,
    err => err.to_string(),
    },
    utils::generate_signature_error_msg(
    func.name(),
    func.signature().clone(),
    &data_types
    )
    )

Once the error macro (ex. plan_datafusion_err!, exec_err!) is called to wrap the error from the inner layer, a new DatafusionError will be reproduced, that is why I can't capture the Diagnostic in the unit test.

This is the DatafusionError in these three locations (use dbg!), we can see Diagnostic lost in the first layer.

[datafusion/expr/src/type_coercion/functions.rs:304:17] &e = Diagnostic(
    Diagnostic {
        kind: Error,
        message: "Wrong number of arguments for sum function call",
        span: None,
        notes: [],
        helps: [],
    },
    Execution(
        "sum function requires 1 argument, got 2",
    ),
)
[datafusion/expr/src/expr_schema.rs:166:25] &err = Execution(
    "Function 'sum' user-defined coercion failed with \"Execution error: sum function requires 1 argument, got 2\"",
)
[datafusion/sql/tests/cases/diagnostic.rs:52:17] &err = Plan(
    "Execution error: Function 'sum' user-defined coercion failed with \"Execution error: sum function requires 1 argument, got 2\" No function matches the given name and argument types 'sum(Int64, Int64)'. You might need to add explicit type casts.\n\tCandidate functions:\n\tsum(UserDefined)",
)

I think there are two solutions:

  1. modify error macro to attach Diagnostic .
  2. reattach Diagnostic again and again in the error chain.

But 1. seems to have a large effect on the codebase, I'm not sure which one is better. I hope to get some suggestions or other best practices, thx so much!!

@eliaperantoni
Copy link
Contributor Author

Hey @Chen-Yuan-Lai, thank you so much for your contributions 🙏 That is indeed an annoying issue. It seems like some of the error macros "flatten" the inner error to a string.

I think I'm in favour of option 2, in hope that there are not too many places where this flattening happens. I wouldn't go for option 1 because, as you say, it would change all existing macro invocations.

Perhaps we want is a DataFusionError::map_diagnosticed_err method? It would:

  1. Check if &self is a DataFusionError::Diagnostic
  2. If so, apply a closure to the wrapper error.

i.e. then you would do:

let err1 = exec_err!("need to download more RAM");
let err2 = err1.with_diagnostic(...);
let err3 = err2.map_diagnosticed_err(|err| 
    plan_err!("that went wrong {}", err);
);

And now err2 would be a Diagnostic(ExecError) and err3 would be a Diagnostic(PlanError).

I'm not sure if this would work or make for a good implementation, but I hope it can help in any way :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants