-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test for reading read statistics from parquet files without statistics and boolean & struct data type #10608
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NGA-TRAN -- I think this just needs some comments (or maybe a builder style API)
) | ||
} | ||
|
||
pub fn parquet_file_one_column_stats( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add some doc comments about what this function does / how it is different? It is clear from this PR that it adds enable_stats
but that might not be obvious in the future
While we are on this topic, I personally find a struct with named parameters easier to read than something the following where you have to look up parquet_file_one_column_stats
to know what the 0
, 4
and 7
mean.
let reader =
parquet_file_one_column_stats(0, 4, 7, row_per_group, EnabledStatistics::None);
I wonder if we could change the code to something like this to make it easier to read 🤔
let reader = TestFile {
num_null: 0
no_null_values_start: 4,
no_null_values_end: 7,
row_per_group,
enable_stats: EnabledStatistic::None,
}
.build();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to do this, we need to think about parquet_file_many_columns
, too.
I have to sign off now. How about we do this in following PR?
This PR also appears to have a conflict now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @NGA-TRAN -- I think this is great. I took the liberty of updating the test framework in the more functional style and merged up from main.
Since this was test only code I just merged it in to keep things moving |
…s and boolean & struct data type (apache#10608) * read statistics from parquet without statistics * tests for boolean and struct array * link to a the struct array bug * Make builder style API * Port to builder style * simlify * Add column name to test --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
More tests for #10453
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?