You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Datafusion 28 and below, UDFs where executed in a separate thread when writing to parquet. The example code below does not fail the assertion in version 28 but does in version 30 and git main.
If you have a UDF that expects to be running in a thread, or does some form of blocking computation then this change means your previously parallel dataframe plan becomes serial.
I couldn't spot anything in the release notes about this.
To Reproduce
deps:
[dependencies]
tokio = { version = "^1.0", features = ["rt-multi-thread", "full"] }
datafusion = { version = "=30", default-features = false, features = ["encoding__expressions", "zstd"] }
hi @orf -- I am not sure that DataFusion guarantees that udfs will be running in a separate thread or that they can do blocking operations without stopping DataFusion
What is your blocking udf doing? Is it doing network operations or something where having async udfs (like in #6518 ) would help?
Describe the bug
In Datafusion 28 and below, UDFs where executed in a separate thread when writing to parquet. The example code below does not fail the assertion in version 28 but does in version 30 and git main.
If you have a UDF that expects to be running in a thread, or does some form of blocking computation then this change means your previously parallel dataframe plan becomes serial.
I couldn't spot anything in the release notes about this.
To Reproduce
deps:
code:
Expected behavior
Sync UDF functions should be executed in a blocking thread pool.
Additional context
I thought this might be related to #7205, but it doesn't appear to be culprit.
The text was updated successfully, but these errors were encountered: