Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve regeneration of sqlite expected test suite #14906

Open
alamb opened this issue Feb 26, 2025 · 0 comments
Open

Improve regeneration of sqlite expected test suite #14906

alamb opened this issue Feb 26, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Feb 26, 2025

Is your feature request related to a problem or challenge?

Thanks to some great work from @Omega359 as part of each commit to main DataFusion runs many thousand queries from the sqlite test suite ❤

This is documented here:

## Running Tests: `sqlite`
Test files in `data/sqlite` directory of the datafusion-testing crate were
sourced from the [sqlite test suite](https://www.sqlite.org/sqllogictest/dir?ci=tip) and have been cleansed and updated to
run within DataFusion's sqllogictest runner.
To run the sqlite tests you need to increase the rust stack size and add
`INCLUDE_SQLITE=true` to run the sqlite tests:
```shell
export RUST_MIN_STACK=30485760;
INCLUDE_SQLITE=true cargo test --test sqllogictests
```
Note that there are well over 5 million queries in these tests and running the
sqlite tests will take a long time. You may wish to run them in release-nonlto mode:
```shell
INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
```
The sqlite tests can also be run with the postgres runner to verify compatibility:
```shell
export RUST_MIN_STACK=30485760;
PG_COMPAT=true INCLUDE_SQLITE=true cargo test --features=postgres --test sqllogictests
```
To update the sqllite expected answers use the `datafusion/sqllogictest/regenerate_sqlite_files.sh` script.
Note this must be run with an empty postgres instance. For example
```shell
PG_URI=postgresql://postgres@localhost:5432/postgres bash datafusion/sqllogictest/regenerate_sqlite_files.sh
```

When expected output changes (for example, error messages) we currently use a script https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/regenerate_sqlite_files.sh that:

  1. It relies on a fork of sqlogictest
  2. It relies on a modified driver program

This is problematic because

  1. As the code in datafusion is updated (for example to new sqlogictest versions) the modified driver program may not work with the new sqlogictest version
  2. The modified driver program may not work with new versions of sqllogictest

This happened with #14824 which made it hard to update the expected output

Describe the solution you'd like

I would like to make sure that regenerate_sqlite_files.sh will always work and will not bitrot over time

@Omega359 says:

That is exactly what I was thinking and hopefully will fix tonight. I think a decent short-term fix is to 'lock' the sqllogictest-rs dependency version and add a comment that any update to it will require a full run of the regenerate script before committing.

Long term ideally would be to improve my changes to my fork of sqllogictest-rs such that they would be suitable to submit a PR to that project. That is not an insignificant amount of work to be honest and I'm a bit thin on time for the next month or two.

Describe alternatives you've considered

No response

Additional context

See last time we had to update the scripts based on changes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant