You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This offers no explanation as to how the statistics will or will not be used.
A user with experience in analytical database engines writing a custom TableProvider implementation may suspect that TableProvider::statistics is used by the datafusion query optimizer to determine join orders, perhaps among other things.
However, this conclusion is apparently incorrect, which I deduce from the following pieces of evidence:
I am a user fitting that description and found that my custom TableProvider::statistics was not called in the presence of a join query
cargo check --workspace --tests runs with no errors if I remove the fn statistics declaration from the trait TableProvider definition
having found the source code for the rule which changes join orders it is clear that it calls ExecutionPlan::statistics instead.
Expectation
The documentation should set appropriate expectations for what TableProvider::statistics is used for, so that developers can make informed choices about whether or not to implement it.
Additional context
The apparent answer to what TableProvider::statistics is used for is "nothing" based on the cargo check --workspace --tests comment above, but removing the trait method is a breaking change. Based on the slack discussion prior to filing this issue, at least one user is depending on TableProvider::statistics for their custom optimizer rules and removing it would require them to find a workaround.
Short of deprecating or removing the trait method, I would personally be satisfied just with updates to the method documentation.
The text was updated successfully, but these errors were encountered:
If we don't do this, we run the risk of deleting this API by accident in a future release
maybe someone who is relying on this API could write some tests / additonal documentation. @avantgardnerio do you know of anyone who might have some time?
The extent of the documentation for
TableProvider::statistics
in version 43.0.0 is:This offers no explanation as to how the statistics will or will not be used.
A user with experience in analytical database engines writing a custom
TableProvider
implementation may suspect thatTableProvider::statistics
is used by the datafusion query optimizer to determine join orders, perhaps among other things.However, this conclusion is apparently incorrect, which I deduce from the following pieces of evidence:
TableProvider::statistics
was not called in the presence of a join querycargo check --workspace --tests
runs with no errors if I remove thefn statistics
declaration from thetrait TableProvider
definitionExecutionPlan::statistics
instead.Expectation
The documentation should set appropriate expectations for what
TableProvider::statistics
is used for, so that developers can make informed choices about whether or not to implement it.Additional context
The apparent answer to what
TableProvider::statistics
is used for is "nothing" based on thecargo check --workspace --tests
comment above, but removing the trait method is a breaking change. Based on the slack discussion prior to filing this issue, at least one user is depending onTableProvider::statistics
for their custom optimizer rules and removing it would require them to find a workaround.Short of deprecating or removing the trait method, I would personally be satisfied just with updates to the method documentation.
The text was updated successfully, but these errors were encountered: