Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor documentation and/or test coverage of TableProvider::statistics #13439

Open
rroelke opened this issue Nov 15, 2024 · 1 comment
Open

Poor documentation and/or test coverage of TableProvider::statistics #13439

rroelke opened this issue Nov 15, 2024 · 1 comment
Assignees

Comments

@rroelke
Copy link

rroelke commented Nov 15, 2024

The extent of the documentation for TableProvider::statistics in version 43.0.0 is:

Get statistics for this table, if available

This offers no explanation as to how the statistics will or will not be used.

A user with experience in analytical database engines writing a custom TableProvider implementation may suspect that TableProvider::statistics is used by the datafusion query optimizer to determine join orders, perhaps among other things.

However, this conclusion is apparently incorrect, which I deduce from the following pieces of evidence:

  1. I am a user fitting that description and found that my custom TableProvider::statistics was not called in the presence of a join query
  2. cargo check --workspace --tests runs with no errors if I remove the fn statistics declaration from the trait TableProvider definition
  3. having found the source code for the rule which changes join orders it is clear that it calls ExecutionPlan::statistics instead.

Expectation

The documentation should set appropriate expectations for what TableProvider::statistics is used for, so that developers can make informed choices about whether or not to implement it.

Additional context

The apparent answer to what TableProvider::statistics is used for is "nothing" based on the cargo check --workspace --tests comment above, but removing the trait method is a breaking change. Based on the slack discussion prior to filing this issue, at least one user is depending on TableProvider::statistics for their custom optimizer rules and removing it would require them to find a workaround.

Short of deprecating or removing the trait method, I would personally be satisfied just with updates to the method documentation.

@alamb
Copy link
Contributor

alamb commented Nov 15, 2024

If we don't do this, we run the risk of deleting this API by accident in a future release

maybe someone who is relying on this API could write some tests / additonal documentation. @avantgardnerio do you know of anyone who might have some time?

@avantgardnerio avantgardnerio self-assigned this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants