-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expand test column dtypes to full scale #492
expand test column dtypes to full scale #492
Conversation
@@ -49,6 +48,25 @@ def test_row_noising_omit_row_or_do_not_respond( | |||
run_omit_row_or_do_not_respond_tests(dataset_name, config, original_data, noised_data) | |||
|
|||
|
|||
def test_column_dtypes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've asked this multiple times but remind me again - how will we run this test as it previously existed (on sample data) but NOT during release-testing? i.e. we need to continue running the previous test every night like we currently are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixtures are set up to read in sample data if we don't run pytest with the --release flag.
# str dtype is 'object' | ||
# Check that they are actually strings and not some other | ||
# type of object. | ||
actual_types = noised_data[col.name].dropna().apply(type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using apply here vectorized? I don't think it is, but maybe we don't have any other options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's not vectorized, but it might be the fastest way according to this answer:
https://stackoverflow.com/questions/55754713/fastest-way-to-find-all-data-types-in-a-pandas-series
expand test column dtypes to full scale
Description=
Move test_column_dtypes to release suite framework.
Testing
Ran tests on 5 datasets (acs, cps, wic, ssa and census)