Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand test column dtypes to full scale #492

Conversation

hussain-jafari
Copy link

expand test column dtypes to full scale

Description=

  • Category: feature
  • JIRA issue: MIC-5866

Move test_column_dtypes to release suite framework.

Testing

Ran tests on 5 datasets (acs, cps, wic, ssa and census)

@@ -49,6 +48,25 @@ def test_row_noising_omit_row_or_do_not_respond(
run_omit_row_or_do_not_respond_tests(dataset_name, config, original_data, noised_data)


def test_column_dtypes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked this multiple times but remind me again - how will we run this test as it previously existed (on sample data) but NOT during release-testing? i.e. we need to continue running the previous test every night like we currently are.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixtures are set up to read in sample data if we don't run pytest with the --release flag.

# str dtype is 'object'
# Check that they are actually strings and not some other
# type of object.
actual_types = noised_data[col.name].dropna().apply(type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using apply here vectorized? I don't think it is, but maybe we don't have any other options?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's not vectorized, but it might be the fastest way according to this answer:

https://stackoverflow.com/questions/55754713/fastest-way-to-find-all-data-types-in-a-pandas-series

@hussain-jafari hussain-jafari merged commit 53d213e into epic/full_scale_testing Feb 24, 2025
11 checks passed
@hussain-jafari hussain-jafari deleted the hjafari/feature/MIC-5866_expand_test_column_dtypes branch February 24, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants