-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(duckdb): literal python floats are interpreted as Decimals, not DOUBLE #10890
Comments
Thanks for opening an issue! Do you have a use case for this? For this specific issue, we removed casting because it was defeating some important pushdown optimizations in duckdb-spatial. |
I found this when I got an overflow error from multiplying two large numbers as I describe. To be specific, when multiplying multiple large probabilities in my record linkage library mismo. |
If we add the explicit cast only for when ibis compiles a floatXX literal to duckdb, would that manage to avoid the optimization issue? |
If others run into this, the workaround I found was ibis.literal("12.34").cast(float) |
You can remove the cast and specify the dtype when you define the literal [ins] In [2]: ibis.literal(1, type='float')
Out[2]: 1.0
[ins] In [3]: ibis.literal(12.34, type='float')
Out[3]: 12.34
[ins] In [4]: ibis.literal('12.34', type='float')
Out[4]: 12.34 |
Unfortunately, all those result in the naked import ibis
ibis.to_sql(ibis.literal(1, type="float"), dialect="duckdb") # SELECT 1.0 AS "1.0"
ibis.to_sql(ibis.literal(12.34, type='float'), dialect='duckdb') # SELECT 12.34 AS "12.34"
ibis.to_sql(ibis.literal("12.34", type='float'), dialect='duckdb') # SELECT 12.34 AS "12.34"
ibis.to_sql(ibis.literal("12.34").cast("float"), dialect="duckdb") # SELECT CAST('12.34' AS DOUBLE) AS "Cast('12.34', float64)"
ibis.to_sql(ibis.literal(12.34).cast("float"), dialect="duckdb") # SELECT 12.34 AS "12.34" In order to actually get ibis to generate the cast-to-float, you need to start with something that ibis doesn't think is a float, eg the |
Here is a little test that I would expect to pass for any expression and backend. Could we add this to our tests, where hypothesis generates expressions for us? import ibis
from ibis.backends.sql import SQLBackend
from ibis.expr import datatypes as dt
def assert_type_roundtrips(e: ibis.ir.Value, backend: SQLBackend):
type_in_backend_string: str = backend.execute(e.typeof())
type_in_backend_ibis: dt.DataType = backend.compiler.type_mapper.from_string(
type_in_backend_string
)
assert e.type() == type_in_backend_ibis, (e.type(), type_in_backend_ibis)
e = ibis.literal(12.34)
backend = ibis.duckdb.connect()
assert_type_roundtrips(e, backend)
# AssertionError: (Float64(nullable=True), Decimal(precision=4, scale=2, nullable=True)) |
If you can come up with a way to not break existing optimizations on DuckDB and get that test to pass, you're more than welcome to try adding it to the test suite. |
That test used to pass before we disabled literal casting on DuckDB. We made the concession because we deemed it more important to enable the optimization that was defeated by casting. |
OK, that tradeoff makes sense that we want to prioritize the optimization more, I bet more people are affected by the optimization than by the bug that I experienced, and my bug has a workaround. I am willing to try to put in a bit of effort to solve this, if it doesn't turn into a total rabbit hole. Links to the perf issues:
@cpcloud in that linked PR, you mention what you think the long-term solution should be:
Can you give some example of this? If for number 2 you are saying that Alternatively, instead of having two different flavors of Ops at construction time, could we do it through re-write rules at compile time?
It might be death by a thousand cuts to properly enumerate all the rules for when the type is important and when it is not. What about if by default we made it so that we compiled the type cast, but just special cased the boolean comparison Ops to not include the cast during compilation? Eg
My thought here is that by default we should try to have explicit casting so that It also looks like in that PR, there is no automatic test to make sure that the filter pushdown happens, is that right? You were just manually running |
#10933 implements the rewrite solution that I describe above. |
What happened?
In duckdb, a value like
1000000000.0000005
is interpreted as aDECIMAL
, not a float. This usually causes no problems. But it sometimes does, like if you multiply two large numbers, you will get an OverFlowError with decimals, where if you were using a float, you would be fine.What version of ibis are you using?
The bug is not present on ibis 9.2.0 and before, and it is present on 9.3.0 and after. I think e4ff1bd was the commit that caused the regression.
What backend(s) are you using, if any?
duckdb
Relevant log output
Code of Conduct
The text was updated successfully, but these errors were encountered: