-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script and documentation for regenerating sqlite test files #14290
Conversation
I plan to test this (by way of review) later today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Omega359 -- I tried this out and I couldn't quite get it working but I think it is close. My notes below
Installing dependencies
I installed sd
as the script advised 👌
# install sd
brew install sd
However, I also had to install several other things on my mac as well:
rename
viabrew install rename
dos2unix
viabrew install dos2unix
Running in a different directory
cd datafusion
$ bash ./datafusion/sqllogictest/regenerate_sqlite_files.sh
Resulted in an error
./datafusion/sqllogictest/regenerate_sqlite_files.sh: line 79: cd: /Users/andrewlamb/datafusion-testing/data/sqlite/: No such file or directory
I had to run it in
cd datafusion
cd datafusion/sqllogictest
./regenerate_sqlite_files.sh
sed
errors
I saw severl errors like this
sed: 1: "./select1.test": invalid command code .
sed: 1: "./select4.test": invalid command code .
sed: 1: "./select5.test": invalid command code .
Maybe this is a difference due to gnu sed
(linux) vs bsd-style sed
(mac) 🤔
Postgres errors
Also there were many many errors like this until I started a docker image named postgres
(this may not be related to this PR)
Error connecting to postgres using PG_URI=postgresql://postgres:[email protected]:16432/test
Then I started my postgres container and then I got an error like this (but it didn't seem to proceed at all)
called `Result::unwrap()` on an `Err` value: Client(StartContainer(DockerResponseServerError { status_code: 500, message: "driver failed programming external connectivity on endpoint silly_sammet (2c4e6fcdaa02e87dd3f99ec57f84993c3f1141cc35b76528d1980b1cda0ad8f2): Bind for 0.0.0.0:16432 failed: port is already allocated" }))
|
||
echo "Running the sqllogictests with sqlite completion. This will take approximately an hour to run" | ||
|
||
cargo test --profile release-nonlto --features postgres --test sqllogictests -- --include-sqlite --postgres-runner --complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this needs to also use the --postgres-runner
command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does so that the pg db is started. I'll take a look at my custom version of the sqllogictests files and see. I could just required that PG is started and available via PG_URI or something.
|
||
# update the sqllogictest Cargo.toml with the new repo for sqllogictest-rs (tied to a specific hash) | ||
cd "$DF_HOME" || exit; | ||
sed -i -e 's~^sqllogictest.*~sqllogictest = { git = "https://github.com/Omega359/sqllogictest-rs.git", rev = "1cd933d" }~' datafusion/sqllogictest/Cargo.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this needed for? Some sort of update to the runner? I wonder if you can publish it to crates.io or something (sqllogictests-omega359 🤔 ) Not sure if that would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That dependency is hacked to all hell to do the completion the way I wanted by using a comparison db (pg) to compare results to if the df results don't match the sqlite results. I sure don't want that thing published anywhere :)
I was worried that macs would be troublesome for this script. I'll try and adjust things for that platform. the difference in sed is ... unfortunate. The postgres errors might be caused by the updated sqllogictest using docker url (since that is how I start pg) vs normal uri (PG_URI). |
Ready for 2nd round of testing... |
Hoping to try it later today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this on a mac
Ran
PG_URI=postgresql://postgres@localhost:5432/postgres ./datafusion/sqllogictest/regenerate_sqlite_files.sh
It ran but still gave me an error (but I think we are pretty close)
enaming .test files to .slt and cleansing the files ...
sed: 1: "select1.test": bad flag in substitute command: 's'
sed: 1: "select4.test": bad flag in substitute command: 's'
sed: 1: "select5.test": bad flag in substitute command: 's'
Updating the datafusion/sqllogictest/Cargo.toml file with an updated sqllogictest dependency
Replacing the datafusion/sqllogictest/bin/sqllogictests.rs file with a custom version required for running completion
Running the sqllogictests with sqlite completion. This will take approximately an hour to run
Updating git repository `https://github.com/Omega359/sqllogictest-rs.git`
Locking 1 package to latest compatible version
Adding sqllogictest v0.24.0 (https://github.com/Omega359/sqllogictest-rs.git?rev=1cd933d#1cd933de)
Compiling phf_shared v0.11.3
Compiling either v1.13.0
Compiling url v2.5.4
Compiling ident_case v1.0.1
Compiling fnv v1.0.7
Compiling strsim v0.11.1
Compiling datafusion-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/common)
Compiling structmeta-derive v0.3.0
Compiling core-foundation v0.9.4
Compiling serde_repr v0.1.19
Compiling home v0.5.11
Compiling hyperlocal v0.9.1
Compiling xattr v1.4.0
Compiling tokio-stream v0.1.17
Compiling postgres-derive v0.4.6
Compiling filetime v0.2.25
Compiling base64 v0.21.7
Compiling etcetera v0.8.0
Compiling phf v0.11.3
Compiling itertools v0.13.0
Compiling itertools v0.14.0
Compiling darling_core v0.20.10
Compiling chrono-tz v0.10.1
Compiling security-framework v2.11.1
Compiling docker_credential v1.3.1
Compiling tokio-tar v0.3.1
Compiling whoami v1.5.2
Compiling postgres-types v0.2.8
Compiling structmeta v0.3.0
Compiling reqwest v0.12.12
Compiling rustls-native-certs v0.7.3
Compiling parse-display-derive v0.9.1
Compiling sqllogictest v0.24.0 (https://github.com/Omega359/sqllogictest-rs.git?rev=1cd933d#1cd933de)
Compiling rust_decimal v1.36.0
Compiling tokio-postgres v0.7.12
Compiling object_store v0.11.2
Compiling darling_macro v0.20.10
Compiling parse-display v0.9.1
Compiling darling v0.20.10
Compiling serde_with_macros v3.12.0
Compiling datafusion-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr-common)
Compiling datafusion-physical-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr-common)
Compiling serde_with v3.12.0
Compiling datafusion-functions-aggregate-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate-common)
Compiling datafusion-functions-window-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window-common)
Compiling arrow-array v54.0.0
Compiling datafusion-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr)
Compiling bollard-stubs v1.45.0-rc.26.0.1
Compiling arrow-select v54.0.0
Compiling arrow-ipc v54.0.0
Compiling arrow-arith v54.0.0
Compiling arrow-row v54.0.0
Compiling arrow-cast v54.0.0
Compiling arrow-string v54.0.0
Compiling arrow-ord v54.0.0
Compiling arrow-csv v54.0.0
Compiling arrow-json v54.0.0
Compiling parquet v54.0.0
Compiling arrow v54.0.0
Compiling datafusion-macros v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/macros)
Compiling bollard v0.17.1
Compiling testcontainers v0.23.1
Compiling testcontainers-modules v0.11.5
Compiling datafusion-physical-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr)
Compiling datafusion-execution v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/execution)
Compiling datafusion-sql v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sql)
Compiling datafusion-physical-plan v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-plan)
Compiling datafusion-functions-aggregate v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate)
Compiling datafusion-functions v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions)
Compiling datafusion-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/optimizer)
Compiling datafusion-functions-window v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window)
Compiling datafusion-catalog v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/catalog)
Compiling datafusion-functions-table v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-table)
Compiling datafusion-physical-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-optimizer)
Compiling datafusion-functions-nested v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-nested)
Compiling datafusion v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/core)
Compiling datafusion-sqllogictest v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sqllogictest)
Finished `release-nonlto` profile [optimized] target(s) in 1m 29s
Running bin/sqllogictests.rs (target/release-nonlto/deps/sqllogictests-c7da27570eaf4149)
Completed in 0 seconds Applying patches for #13784 (https://github.com/apache/datafusion/issues/13784)
error: failed processing files:
FailedJobs(
"datafusion-testing/data/sqlite/random/aggregates/slt_good_102.slt": No such file or directory (os error 2)
)
error: failed processing files:
FailedJobs(
"datafusion-testing/data/sqlite/random/aggregates/slt_good_112.slt": No such file or directory (os error 2)
)
Running the sqllogictests with sqlite files. This will take approximately 20 minutes to run
Compiling parquet v54.0.0
Compiling sqllogictest v0.24.0 (https://github.com/Omega359/sqllogictest-rs.git?rev=1cd933d#1cd933de)
Compiling datafusion-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/common)
Compiling datafusion-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr-common)
Compiling datafusion-physical-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr-common)
Compiling datafusion-functions-window-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window-common)
Compiling datafusion-functions-aggregate-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate-common)
Compiling datafusion-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr)
Compiling datafusion-physical-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr)
Compiling datafusion-execution v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/execution)
Compiling datafusion-sql v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sql)
Compiling datafusion-functions v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions)
Compiling datafusion-physical-plan v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-plan)
Compiling datafusion-functions-aggregate v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate)
Compiling datafusion-functions-window v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window)
Compiling datafusion-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/optimizer)
Compiling datafusion-catalog v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/catalog)
Compiling datafusion-functions-table v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-table)
Compiling datafusion-physical-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-optimizer)
Compiling datafusion-functions-nested v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-nested)
Compiling datafusion v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/core)
Compiling datafusion-sqllogictest v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sqllogictest)
Finished `release-nonlto` profile [optimized] target(s) in 1m 06s
Running bin/sqllogictests.rs (target/release-nonlto/deps/sqllogictests-b7320e665da12135)
Completed in 1 second Sqlite tests completed successfully. The datafusion-testing git submodule is ready to be pushed to a new remote and a PR created in https://github.com/apache/datafusion-testing/
Cleaning up source code changes and temporary files and directories
Updated 1 path from the index
Updated 1 path from the index
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git status
On branch feature/regenerate_sqlite_files
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
(commit or discard the untracked or modified content in submodules)
modified: datafusion-testing (modified content)
Untracked files:
(use "git add <file>..." to include in what will be committed)
datafusion/sqllogictest/Cargo.toml-e
datafusion/sqllogictest/test_files/pg_compat/pg_compat_null.slt.bak
datafusion/sqllogictest/test_files/pg_compat/pg_compat_simple.slt.bak
datafusion/sqllogictest/test_files/pg_compat/pg_compat_type_coercion.slt.bak
datafusion/sqllogictest/test_files/pg_compat/pg_compat_types.slt.bak
datafusion/sqllogictest/test_files/pg_compat/pg_compat_union.slt.bak
datafusion/sqllogictest/test_files/pg_compat/pg_compat_window.slt.bak
I would like to try and actually modify something to see if it can fix it...
I think I found a decent alternative to using sed that should work. it seems the script isn't updating/finding the datafusion-testing for some reason for you. Not sure why yet |
I am perplexed. My best guess is the find command isn't quite the same as it's on linux and isn't working as expected on a mac. Just a guess though. I don't have a mac that I can test this on (wife's mac is not an option ... 🪓) so I unfortunately can't get to the root of the issue myself. |
… pg via docker, cleanup pg_compat .bak files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @Omega359 -- I ran this code again and this time it got no errors, and the only thing left is this:
On branch feature/regenerate_sqlite_files
Untracked files:
(use "git add <file>..." to include in what will be committed)
datafusion/sqllogictest/Cargo.toml-e
nothing added to commit but untracked files present (use "git add" to track)
But that seems ok
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ diff -du datafusion/sqllogictest/Cargo.toml datafusion/sqllogictest/Cargo.toml-e
--- datafusion/sqllogictest/Cargo.toml 2025-02-05 07:21:12
+++ datafusion/sqllogictest/Cargo.toml-e 2025-02-03 10:53:45
@@ -42,7 +42,6 @@
chrono = { workspace = true, optional = true }
clap = { version = "4.5.16", features = ["derive", "env"] }
datafusion = { workspace = true, default-features = true, features = ["avro"] }
-datafusion-catalog = { workspace = true, default-features = true }
datafusion-common = { workspace = true, default-features = true }
datafusion-common-runtime = { workspace = true, default-features = true }
futures = { workspace = true }
@@ -54,7 +53,7 @@
postgres-protocol = { version = "0.6.7", optional = true }
postgres-types = { version = "0.2.8", features = ["derive", "with-chrono-0_4"], optional = true }
rust_decimal = { version = "1.36.0", features = ["tokio-pg"] }
-sqllogictest = "0.26.4"
+sqllogictest = "0.26.0"
sqlparser = { workspace = true }
tempfile = { workspace = true }
testcontainers = { version = "0.23", features = ["default"], optional = true }
I am going to test running this with #14371 too to see if that results in expected result changes
It worked! See PR here: |
I have no clue where that Cargo.toml-e file is coming from, I don't see that locally |
Thanks again @Omega359 -- let's get this one in |
Which issue does this PR close?
Rationale for this change
Allow others to regenerate the sqlite tests files
What changes are included in this PR?
regeneration script, documentation
Are these changes tested?
Yes, on linux vm inside WSL.
Are there any user-facing changes?
Contributor guide update.