Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach Diagnostic to "incompatible type in unary expression" error #14433

Open
Tracked by #14429
eliaperantoni opened this issue Feb 3, 2025 · 9 comments
Open
Tracked by #14429
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@eliaperantoni
Copy link
Contributor

Is your feature request related to a problem or challenge?

For a query like:

SELECT +'a'

The only message that the end user of an application built atop of DataFusion sees is:

Error during planning: Unary operator '+' only supports numeric, interval and timestamp types

We want to provide a richer message that references and highlights locations in the original SQL query, and contextualises and helps the user understand the error. In the end, it would be possible to display errors in a fashion akin to what was enabled by #13664 for some errors:

See #14429 for more information.

Describe the solution you'd like

Attach a well crafted Diagnostic to the DataFusionError, building on top of the foundations laid in #13664. See #14429 for more information.

Describe alternatives you've considered

No response

Additional context

No response

@eliaperantoni eliaperantoni added the enhancement New feature or request label Feb 3, 2025
@eliaperantoni eliaperantoni changed the title Attach Diagnostic to "incompatible type in unary expression" error Attach Diagnostic to "incompatible type in unary expression" error Feb 3, 2025
@alamb alamb added the good first issue Good for newcomers label Feb 4, 2025
@alamb
Copy link
Contributor

alamb commented Feb 4, 2025

I think this is a good first issue as the need is clear and the tests in https://github.com/apache/datafusion/blob/85fbde2661bdb462fc498dc18f055c44f229604c/datafusion/sql/tests/cases/diagnostic.rs are well structured for extension.

@alan910127
Copy link
Contributor

take

@alan910127
Copy link
Contributor

alan910127 commented Feb 6, 2025

Hi @eliaperantoni, would you mind sharing your rendering code snippet? It helps verifying if my implementation is correct!

Also, I'm a bit lost on how to implement this. Add seems straightforward, but the others are not. Since the logic for Not and Minus is implemented in Expr::Not and Expr::Negative, where there's no span information, I can't provide enough details (notes). Additionally, I can't get the span of the whole expression, so that the diagnostic itself also lacks this information.

@eliaperantoni
Copy link
Contributor Author

Hey @alan910127, sorry for the delay, thank you so much for taking on this ticket!

I suggest you look at datafusion/sql/tests/cases/diagnostic.rs and create a test case there, that should be enough to validate that you're producing a good Diagnostic 😊. You can see some good examples there.

We try to make our own "viewer" implementation not influence the design of Diagnostic in DataFusion, which should instead be generic.

The type of Diagnostic should guide you towards a correct instance. i.e. it's either an error or a warning (not the case of this ticket) with a string message and (ideally) a Span. Plus zero or more notes and helps that also have a string message and (possibly) a Span. A note is to give more contextual information, an help is to directly propose a fix to the user.

As for the Span, you usually want to call Expr::spans to get source information about an expression. This currently only works if it's an Expr::Column so you should change the method a bit so it works for Expr::Negative and Expr::Not. Currently we only store spans inside of datafusion::common::Column. If your implementation cannot work just with that, you might want to copy over more spans from the AST tree to the logical nodes. You'd do that in SqlToRel, looks like SqlToRel::parse_sql_unary_op is the one you're looking for. But bear in mind that the parser itself is not yet perfect in this regard and might not always have the data you need. e.g. I wouldn't be surprised if it had no clue about the Span for a negation operator.

In any case, I think it's perfectly fine if in this PR you don't get it 100% right. e.g. I think it's okay if nested unary expressions only highlight the column name (e.g. SELECT -+c -- only c is highlighted). We'll improve Expr::spans as the parser improves.

@alan910127
Copy link
Contributor

@eliaperantoni thank you so much for the detailed explanation! Since Expr::Negative and Expr::Not just wraps another Expr, does it mean that I need to make span available on all of the cases in order to make them work?

@eliaperantoni
Copy link
Contributor Author

@eliaperantoni thank you so much for the detailed explanation! Since Expr::Negative and Expr::Not just wraps another Expr, does it mean that I need to make span available on all of the cases in order to make them work?

For a perfect implementation, yes that would be necessary. But that goes well beyond the scope of this ticket. I think it's perfectly fine if, in the case of Expr::Negative and Expr::Not and similar, you call spans recursively. This will already make it work in a lot of common cases like Not(Column) (e.g. SELECT NOT is_admin) or Negative(Column) (e.g. SELECT -score AS loss).

And then a more extensive implementation of Expr::spans will slowly come along as more tickets are implemented, and unary expressions would be future-proof because you make a recursive call :)

@eliaperantoni
Copy link
Contributor Author

Hey @alan910127 how is it going with this ticket :) Can I help with anything?

@alan910127
Copy link
Contributor

alan910127 commented Feb 14, 2025

@eliaperantoni I am trying to locate where does Expr::Not transformed into a Operator::IsDistinctFrom with codelldb, but it seems like the debugger crashes every time when stepping into SqlToRel::sql_statement_to_plan_with_context_impl

@alan910127
Copy link
Contributor

alan910127 commented Feb 17, 2025

I've found that the relevant code is in datafusion/optimizer/src/analyzer/type_coercion.rs. However, there's another issue: the Expr::Not match arm calls get_casted_expr_for_bool_op, which converts the expression into Operator::IsDistinctFrom via BinaryTypeCoercer. This transformation attaches a diagnostic, which is handled by this commit.

I see two possible approaches to resolving this:

  1. Handle the incompatible types error early in Expr::Not, but this would scatter type-checking logic throughout the codebase.
  2. Use the last diagnostic instead of the first, but this would break the locality principle of keeping diagnostics near the error and might affect existing behavior.

Do you have any thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants