Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script and documentation for regenerating sqlite test files #14290

Merged
merged 6 commits into from
Feb 6, 2025

Conversation

Omega359
Copy link
Contributor

@Omega359 Omega359 commented Jan 25, 2025

Which issue does this PR close?

Rationale for this change

Allow others to regenerate the sqlite tests files

What changes are included in this PR?

regeneration script, documentation

Are these changes tested?

Yes, on linux vm inside WSL.

Are there any user-facing changes?

Contributor guide update.

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) labels Jan 25, 2025
@Omega359 Omega359 marked this pull request as ready for review January 25, 2025 03:04
@Omega359 Omega359 changed the title Adding script and documentation for regenerating sqlite test files Script and documentation for regenerating sqlite test files Jan 25, 2025
@alamb
Copy link
Contributor

alamb commented Jan 29, 2025

I plan to test this (by way of review) later today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Omega359 -- I tried this out and I couldn't quite get it working but I think it is close. My notes below

Installing dependencies

I installed sd as the script advised 👌

# install sd
brew install sd

However, I also had to install several other things on my mac as well:

  • rename via brew install rename
  • dos2unix via brew install dos2unix

Running in a different directory

cd datafusion
$ bash ./datafusion/sqllogictest/regenerate_sqlite_files.sh

Resulted in an error

./datafusion/sqllogictest/regenerate_sqlite_files.sh: line 79: cd: /Users/andrewlamb/datafusion-testing/data/sqlite/: No such file or directory

I had to run it in

cd datafusion
cd datafusion/sqllogictest
./regenerate_sqlite_files.sh

sed errors

I saw severl errors like this

sed: 1: "./select1.test": invalid command code .
sed: 1: "./select4.test": invalid command code .
sed: 1: "./select5.test": invalid command code .

Maybe this is a difference due to gnu sed (linux) vs bsd-style sed (mac) 🤔

Postgres errors

Also there were many many errors like this until I started a docker image named postgres (this may not be related to this PR)

Error connecting to postgres using PG_URI=postgresql://postgres:[email protected]:16432/test

Then I started my postgres container and then I got an error like this (but it didn't seem to proceed at all)

called `Result::unwrap()` on an `Err` value: Client(StartContainer(DockerResponseServerError { status_code: 500, message: "driver failed programming external connectivity on endpoint silly_sammet (2c4e6fcdaa02e87dd3f99ec57f84993c3f1141cc35b76528d1980b1cda0ad8f2): Bind for 0.0.0.0:16432 failed: port is already allocated" }))

datafusion/sqllogictest/regenerate_sqlite_files.sh Outdated Show resolved Hide resolved

echo "Running the sqllogictests with sqlite completion. This will take approximately an hour to run"

cargo test --profile release-nonlto --features postgres --test sqllogictests -- --include-sqlite --postgres-runner --complete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this needs to also use the --postgres-runner command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does so that the pg db is started. I'll take a look at my custom version of the sqllogictests files and see. I could just required that PG is started and available via PG_URI or something.


# update the sqllogictest Cargo.toml with the new repo for sqllogictest-rs (tied to a specific hash)
cd "$DF_HOME" || exit;
sed -i -e 's~^sqllogictest.*~sqllogictest = { git = "https://github.com/Omega359/sqllogictest-rs.git", rev = "1cd933d" }~' datafusion/sqllogictest/Cargo.toml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this needed for? Some sort of update to the runner? I wonder if you can publish it to crates.io or something (sqllogictests-omega359 🤔 ) Not sure if that would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dependency is hacked to all hell to do the completion the way I wanted by using a comparison db (pg) to compare results to if the df results don't match the sqlite results. I sure don't want that thing published anywhere :)

@Omega359
Copy link
Contributor Author

I was worried that macs would be troublesome for this script. I'll try and adjust things for that platform. the difference in sed is ... unfortunate.

The postgres errors might be caused by the updated sqllogictest using docker url (since that is how I start pg) vs normal uri (PG_URI).

@Omega359
Copy link
Contributor Author

Ready for 2nd round of testing...

@alamb
Copy link
Contributor

alamb commented Feb 3, 2025

Ready for 2nd round of testing...

Hoping to try it later today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on a mac

Ran

 PG_URI=postgresql://postgres@localhost:5432/postgres ./datafusion/sqllogictest/regenerate_sqlite_files.sh

It ran but still gave me an error (but I think we are pretty close)

enaming .test files to .slt and cleansing the files ...
sed: 1: "select1.test": bad flag in substitute command: 's'
sed: 1: "select4.test": bad flag in substitute command: 's'
sed: 1: "select5.test": bad flag in substitute command: 's'
Updating the datafusion/sqllogictest/Cargo.toml file with an updated sqllogictest dependency
Replacing the datafusion/sqllogictest/bin/sqllogictests.rs file with a custom version required for running completion
Running the sqllogictests with sqlite completion. This will take approximately an hour to run
    Updating git repository `https://github.com/Omega359/sqllogictest-rs.git`
     Locking 1 package to latest compatible version
      Adding sqllogictest v0.24.0 (https://github.com/Omega359/sqllogictest-rs.git?rev=1cd933d#1cd933de)
   Compiling phf_shared v0.11.3
   Compiling either v1.13.0
   Compiling url v2.5.4
   Compiling ident_case v1.0.1
   Compiling fnv v1.0.7
   Compiling strsim v0.11.1
   Compiling datafusion-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/common)
   Compiling structmeta-derive v0.3.0
   Compiling core-foundation v0.9.4
   Compiling serde_repr v0.1.19
   Compiling home v0.5.11
   Compiling hyperlocal v0.9.1
   Compiling xattr v1.4.0
   Compiling tokio-stream v0.1.17
   Compiling postgres-derive v0.4.6
   Compiling filetime v0.2.25
   Compiling base64 v0.21.7
   Compiling etcetera v0.8.0
   Compiling phf v0.11.3
   Compiling itertools v0.13.0
   Compiling itertools v0.14.0
   Compiling darling_core v0.20.10
   Compiling chrono-tz v0.10.1
   Compiling security-framework v2.11.1
   Compiling docker_credential v1.3.1
   Compiling tokio-tar v0.3.1
   Compiling whoami v1.5.2
   Compiling postgres-types v0.2.8
   Compiling structmeta v0.3.0
   Compiling reqwest v0.12.12
   Compiling rustls-native-certs v0.7.3
   Compiling parse-display-derive v0.9.1
   Compiling sqllogictest v0.24.0 (https://github.com/Omega359/sqllogictest-rs.git?rev=1cd933d#1cd933de)
   Compiling rust_decimal v1.36.0
   Compiling tokio-postgres v0.7.12
   Compiling object_store v0.11.2
   Compiling darling_macro v0.20.10
   Compiling parse-display v0.9.1
   Compiling darling v0.20.10
   Compiling serde_with_macros v3.12.0
   Compiling datafusion-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr-common)
   Compiling datafusion-physical-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr-common)
   Compiling serde_with v3.12.0
   Compiling datafusion-functions-aggregate-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate-common)
   Compiling datafusion-functions-window-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window-common)
   Compiling arrow-array v54.0.0
   Compiling datafusion-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr)
   Compiling bollard-stubs v1.45.0-rc.26.0.1
   Compiling arrow-select v54.0.0
   Compiling arrow-ipc v54.0.0
   Compiling arrow-arith v54.0.0
   Compiling arrow-row v54.0.0
   Compiling arrow-cast v54.0.0
   Compiling arrow-string v54.0.0
   Compiling arrow-ord v54.0.0
   Compiling arrow-csv v54.0.0
   Compiling arrow-json v54.0.0
   Compiling parquet v54.0.0
   Compiling arrow v54.0.0
   Compiling datafusion-macros v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/macros)
   Compiling bollard v0.17.1
   Compiling testcontainers v0.23.1
   Compiling testcontainers-modules v0.11.5
   Compiling datafusion-physical-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr)
   Compiling datafusion-execution v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/execution)
   Compiling datafusion-sql v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sql)
   Compiling datafusion-physical-plan v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-plan)
   Compiling datafusion-functions-aggregate v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate)
   Compiling datafusion-functions v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions)
   Compiling datafusion-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/optimizer)
   Compiling datafusion-functions-window v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window)
   Compiling datafusion-catalog v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/catalog)
   Compiling datafusion-functions-table v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-table)
   Compiling datafusion-physical-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-optimizer)
   Compiling datafusion-functions-nested v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-nested)
   Compiling datafusion v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/core)
   Compiling datafusion-sqllogictest v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sqllogictest)
    Finished `release-nonlto` profile [optimized] target(s) in 1m 29s
     Running bin/sqllogictests.rs (target/release-nonlto/deps/sqllogictests-c7da27570eaf4149)
Completed in 0 seconds                                                                                                                           Applying patches for #13784 (https://github.com/apache/datafusion/issues/13784)
error: failed processing files:
	FailedJobs(
	"datafusion-testing/data/sqlite/random/aggregates/slt_good_102.slt": No such file or directory (os error 2)
)
error: failed processing files:
	FailedJobs(
	"datafusion-testing/data/sqlite/random/aggregates/slt_good_112.slt": No such file or directory (os error 2)
)
Running the sqllogictests with sqlite files. This will take approximately 20 minutes to run
   Compiling parquet v54.0.0
   Compiling sqllogictest v0.24.0 (https://github.com/Omega359/sqllogictest-rs.git?rev=1cd933d#1cd933de)
   Compiling datafusion-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/common)
   Compiling datafusion-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr-common)
   Compiling datafusion-physical-expr-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr-common)
   Compiling datafusion-functions-window-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window-common)
   Compiling datafusion-functions-aggregate-common v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate-common)
   Compiling datafusion-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/expr)
   Compiling datafusion-physical-expr v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-expr)
   Compiling datafusion-execution v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/execution)
   Compiling datafusion-sql v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sql)
   Compiling datafusion-functions v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions)
   Compiling datafusion-physical-plan v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-plan)
   Compiling datafusion-functions-aggregate v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-aggregate)
   Compiling datafusion-functions-window v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-window)
   Compiling datafusion-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/optimizer)
   Compiling datafusion-catalog v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/catalog)
   Compiling datafusion-functions-table v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-table)
   Compiling datafusion-physical-optimizer v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/physical-optimizer)
   Compiling datafusion-functions-nested v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/functions-nested)
   Compiling datafusion v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/core)
   Compiling datafusion-sqllogictest v44.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/sqllogictest)
    Finished `release-nonlto` profile [optimized] target(s) in 1m 06s
     Running bin/sqllogictests.rs (target/release-nonlto/deps/sqllogictests-b7320e665da12135)
Completed in 1 second                                                                                                                            Sqlite tests completed successfully. The datafusion-testing git submodule is ready to be pushed to a new remote and a PR created in https://github.com/apache/datafusion-testing/
Cleaning up source code changes and temporary files and directories
Updated 1 path from the index
Updated 1 path from the index
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git status
On branch feature/regenerate_sqlite_files
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
	modified:   datafusion-testing (modified content)

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	datafusion/sqllogictest/Cargo.toml-e
	datafusion/sqllogictest/test_files/pg_compat/pg_compat_null.slt.bak
	datafusion/sqllogictest/test_files/pg_compat/pg_compat_simple.slt.bak
	datafusion/sqllogictest/test_files/pg_compat/pg_compat_type_coercion.slt.bak
	datafusion/sqllogictest/test_files/pg_compat/pg_compat_types.slt.bak
	datafusion/sqllogictest/test_files/pg_compat/pg_compat_union.slt.bak
	datafusion/sqllogictest/test_files/pg_compat/pg_compat_window.slt.bak

I would like to try and actually modify something to see if it can fix it...

@Omega359
Copy link
Contributor Author

Omega359 commented Feb 3, 2025

I think I found a decent alternative to using sed that should work.

it seems the script isn't updating/finding the datafusion-testing for some reason for you. Not sure why yet

@Omega359
Copy link
Contributor Author

Omega359 commented Feb 3, 2025

I am perplexed. My best guess is the find command isn't quite the same as it's on linux and isn't working as expected on a mac. Just a guess though. I don't have a mac that I can test this on (wife's mac is not an option ... 🪓) so I unfortunately can't get to the root of the issue myself.

… pg via docker, cleanup pg_compat .bak files.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @Omega359 -- I ran this code again and this time it got no errors, and the only thing left is this:

On branch feature/regenerate_sqlite_files
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	datafusion/sqllogictest/Cargo.toml-e

nothing added to commit but untracked files present (use "git add" to track)

But that seems ok

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ diff -du datafusion/sqllogictest/Cargo.toml datafusion/sqllogictest/Cargo.toml-e
--- datafusion/sqllogictest/Cargo.toml	2025-02-05 07:21:12
+++ datafusion/sqllogictest/Cargo.toml-e	2025-02-03 10:53:45
@@ -42,7 +42,6 @@
 chrono = { workspace = true, optional = true }
 clap = { version = "4.5.16", features = ["derive", "env"] }
 datafusion = { workspace = true, default-features = true, features = ["avro"] }
-datafusion-catalog = { workspace = true, default-features = true }
 datafusion-common = { workspace = true, default-features = true }
 datafusion-common-runtime = { workspace = true, default-features = true }
 futures = { workspace = true }
@@ -54,7 +53,7 @@
 postgres-protocol = { version = "0.6.7", optional = true }
 postgres-types = { version = "0.2.8", features = ["derive", "with-chrono-0_4"], optional = true }
 rust_decimal = { version = "1.36.0", features = ["tokio-pg"] }
-sqllogictest = "0.26.4"
+sqllogictest = "0.26.0"
 sqlparser = { workspace = true }
 tempfile = { workspace = true }
 testcontainers = { version = "0.23", features = ["default"], optional = true }

I am going to test running this with #14371 too to see if that results in expected result changes

@alamb
Copy link
Contributor

alamb commented Feb 5, 2025

I am going to test running this with #14371 too to see if that results in expected result change

It worked!

See PR here:

@Omega359
Copy link
Contributor Author

Omega359 commented Feb 5, 2025

I have no clue where that Cargo.toml-e file is coming from, I don't see that locally

@alamb
Copy link
Contributor

alamb commented Feb 6, 2025

Thanks again @Omega359 -- let's get this one in

@alamb alamb merged commit 8ebed67 into apache:main Feb 6, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automate updating sqllogictest updates [EPIC] Run full sqllogic / sqlite test suite against DataFusion
2 participants