bug(duckdb): literal python floats are interpreted as Decimals, not DOUBLE #10890

NickCrews · 2025-02-24T02:47:39Z

What happened?

In duckdb, a value like 1000000000.0000005 is interpreted as a DECIMAL, not a float. This usually causes no problems. But it sometimes does, like if you multiply two large numbers, you will get an OverFlowError with decimals, where if you were using a float, you would be fine.

import ibis
import duckdb

print(duckdb.sql("SELECT typeof(1000000000.0000005)").fetchone()[0])  # DECIMAL(17,7)
print(
    duckdb.sql("SELECT 1000000000.0000005::DOUBLE * 1000000").fetchone()[0]
)  # 1000000000000000.5

e = ibis.literal(1000000000.0000005) * ibis.literal(1_000_000)
print(
    ibis.to_sql(e)
)  # SELECT 1000000000.0000005 * 1000000 AS "Multiply(1000000000.0000005, 1000000)"
e.execute()
# OutOfRangeException: Out of Range Error: Overflow in multiplication of DECIMAL(18) (10000000000000005 * 1000000)

What version of ibis are you using?

The bug is not present on ibis 9.2.0 and before, and it is present on 9.3.0 and after. I think e4ff1bd was the commit that caused the regression.

What backend(s) are you using, if any?

duckdb

Relevant log output

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

cpcloud · 2025-02-27T11:30:18Z

Thanks for opening an issue!

Do you have a use case for this?

For this specific issue, we removed casting because it was defeating some important pushdown optimizations in duckdb-spatial.

NickCrews · 2025-02-27T16:37:11Z

I found this when I got an overflow error from multiplying two large numbers as I describe.

To be specific, when multiplying multiple large probabilities in my record linkage library mismo.
See
NickCrews/mismo@357fa79#diff-1ec0582d16916ba2d4319891e339b61c8e19c5942fb4912be84f02e9d1fecd30R311-R313 (eg we have several columns which represent the odds from some feature, eg that the address or names match, and multiply them together. These odds can be quite large. See https://moj-analytical-services.github.io/splink/topic_guides/theory/fellegi_sunter.html#deriving-match-weights-from-m-and-u)

NickCrews · 2025-02-27T16:39:41Z

If we add the explicit cast only for when ibis compiles a floatXX literal to duckdb, would that manage to avoid the optimization issue?

NickCrews · 2025-02-27T16:41:25Z

If others run into this, the workaround I found was ibis.literal("12.34").cast(float)

gforsyth · 2025-02-27T17:35:50Z

You can remove the cast and specify the dtype when you define the literal

[ins] In [2]: ibis.literal(1, type='float')
Out[2]: 1.0

[ins] In [3]: ibis.literal(12.34, type='float')
Out[3]: 12.34

[ins] In [4]: ibis.literal('12.34', type='float')
Out[4]: 12.34

NickCrews · 2025-03-02T02:28:18Z

Unfortunately, all those result in the naked 12.34 (which duckdb interprets as decimal) in the generate SQL:

import ibis

ibis.to_sql(ibis.literal(1, type="float"), dialect="duckdb")  # SELECT 1.0 AS "1.0"
ibis.to_sql(ibis.literal(12.34, type='float'), dialect='duckdb')  # SELECT 12.34 AS "12.34"
ibis.to_sql(ibis.literal("12.34", type='float'), dialect='duckdb')  # SELECT 12.34 AS "12.34"
ibis.to_sql(ibis.literal("12.34").cast("float"), dialect="duckdb")  # SELECT CAST('12.34' AS DOUBLE) AS "Cast('12.34', float64)"
ibis.to_sql(ibis.literal(12.34).cast("float"), dialect="duckdb")  # SELECT 12.34 AS "12.34"

In order to actually get ibis to generate the cast-to-float, you need to start with something that ibis doesn't think is a float, eg the ibis.literal("12.34") which ibis thinks has dtype of string.

NickCrews · 2025-03-02T02:43:52Z

Here is a little test that I would expect to pass for any expression and backend. Could we add this to our tests, where hypothesis generates expressions for us?

import ibis
from ibis.backends.sql import SQLBackend
from ibis.expr import datatypes as dt


def assert_type_roundtrips(e: ibis.ir.Value, backend: SQLBackend):
    type_in_backend_string: str = backend.execute(e.typeof())
    type_in_backend_ibis: dt.DataType = backend.compiler.type_mapper.from_string(
        type_in_backend_string
    )
    assert e.type() == type_in_backend_ibis, (e.type(), type_in_backend_ibis)


e = ibis.literal(12.34)
backend = ibis.duckdb.connect()
assert_type_roundtrips(e, backend)
# AssertionError: (Float64(nullable=True), Decimal(precision=4, scale=2, nullable=True))

cpcloud · 2025-03-03T15:45:00Z

If you can come up with a way to not break existing optimizations on DuckDB and get that test to pass, you're more than welcome to try adding it to the test suite.

cpcloud · 2025-03-03T15:45:57Z

That test used to pass before we disabled literal casting on DuckDB. We made the concession because we deemed it more important to enable the optimization that was defeated by casting.

NickCrews · 2025-03-03T17:28:08Z

OK, that tradeoff makes sense that we want to prioritize the optimization more, I bet more people are affected by the optimization than by the bug that I experienced, and my bug has a workaround.

I am willing to try to put in a bit of effort to solve this, if it doesn't turn into a total rabbit hole.

Links to the perf issues:

original bug: bug(geospatial): casting of literals preventing filter push down to parquet reader #9662
PR that fixed it: perf(duckdb): remove numeric literal casts when compiling sql #9664

@cpcloud in that linked PR, you mention what you think the long-term solution should be:

Longer term, I think we should probably support two kinds of literals:

A ops.TypedLiteral when a type is specified in ibis.literal, which will be cast to the specific type

ops.Literal would then be a literal that has type information but avoids casting. This is what op construction would use.

Can you give some example of this? If for number 2 you are saying that Ops.Multiply(1, 12.34) should avoid casting and compile to 1 * 12.34, then we would still suffer from ibis thinking this is a float64 but it getting evaluated as a decimal on the backend.

Alternatively, instead of having two different flavors of Ops at construction time, could we do it through re-write rules at compile time?

in expressions like ops.Equal(x, ops.Literal(12.34)), we can avoid the cast, because the overall type of boolean doesn't care about if the 12.34 is a float or decimal.
but, in expressions like ops.Add(x, ops.Literal(12.34)), where the overall type is numeric, we DO need the cast?

It might be death by a thousand cuts to properly enumerate all the rules for when the type is important and when it is not. What about if by default we made it so that we compiled the type cast, but just special cased the boolean comparison Ops to not include the cast during compilation? Eg

we kept pretty much all our implementation for building the expression tree the same.
we added a new ops.UntypedLiteral
we added a rewrite rule so that ops.Equal(x, ops.Literal(12.34)) get rewritten to ops.Equal(x, ops.UntypedLiteral(12.34))
ops.Literal, when compiled, always includes the cast, ops.UntypedLiteral does not.

My thought here is that by default we should try to have explicit casting so that assert_type_roundtrips always holds, and then only special case away from this when we find there are optimizations etc that we want to keep.

It also looks like in that PR, there is no automatic test to make sure that the filter pushdown happens, is that right? You were just manually running EXPLAIN <query> on the query from the original issue?

NickCrews · 2025-03-03T19:51:56Z

#10933 implements the rewrite solution that I describe above.

NickCrews added the bug Incorrect behavior inside of ibis label Feb 24, 2025

github-project-automation bot added this to Ibis planning and roadmap Feb 24, 2025

github-project-automation bot moved this to backlog in Ibis planning and roadmap Feb 24, 2025

cpcloud added the duckdb The DuckDB backend label Feb 27, 2025

NickCrews linked a pull request Mar 3, 2025 that will close this issue

fix: include casts for numeric Literals except in comparisons #10933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(duckdb): literal python floats are interpreted as Decimals, not DOUBLE #10890

bug(duckdb): literal python floats are interpreted as Decimals, not DOUBLE #10890

NickCrews commented Feb 24, 2025 •

edited

Loading

cpcloud commented Feb 27, 2025

NickCrews commented Feb 27, 2025 •

edited

Loading

NickCrews commented Feb 27, 2025

NickCrews commented Feb 27, 2025

gforsyth commented Feb 27, 2025

NickCrews commented Mar 2, 2025 •

edited

Loading

NickCrews commented Mar 2, 2025

cpcloud commented Mar 3, 2025

cpcloud commented Mar 3, 2025

NickCrews commented Mar 3, 2025 •

edited

Loading

NickCrews commented Mar 3, 2025

bug(duckdb): literal python floats are interpreted as Decimals, not DOUBLE #10890

bug(duckdb): literal python floats are interpreted as Decimals, not DOUBLE #10890

Comments

NickCrews commented Feb 24, 2025 • edited Loading

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

cpcloud commented Feb 27, 2025

NickCrews commented Feb 27, 2025 • edited Loading

NickCrews commented Feb 27, 2025

NickCrews commented Feb 27, 2025

gforsyth commented Feb 27, 2025

NickCrews commented Mar 2, 2025 • edited Loading

NickCrews commented Mar 2, 2025

cpcloud commented Mar 3, 2025

cpcloud commented Mar 3, 2025

NickCrews commented Mar 3, 2025 • edited Loading

NickCrews commented Mar 3, 2025

NickCrews commented Feb 24, 2025 •

edited

Loading

NickCrews commented Feb 27, 2025 •

edited

Loading

NickCrews commented Mar 2, 2025 •

edited

Loading

NickCrews commented Mar 3, 2025 •

edited

Loading