Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor pytest unit tests to dbt unit tests #346

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

davidbooke4
Copy link
Collaborator

@davidbooke4 davidbooke4 commented Oct 24, 2024

Description & motivation

The goal of this PR is to move the pytest unit tests to dbt unit tests.

Key Changes

  • Refactor existing pytest unit tests to dbt unit tests
  • Add dbt unit tests to stg_ga4__events for url_parsing macros which didn't have pytest unit tests
  • Add or event_source is null condition to session_ fields in stg_ga4__sessions_traffic_sources model. I did this because it seemed that sessions with a null event_source but non-null other properties were incorrectly being assigned (none) values and attributed to the Direct default_channel_grouping.
  • Remove pytest unit test files for unit tests that have been moved to dbt unit tests
  • Add job to run_unit_tests_on_pr.yml for executing the dbt unit tests. To support the Github Action that will execute this job, I've done the following:
    • Added additional secrets and environment variables to this repo and declared environment variables in run_unit_tests_on_pr.yml
    • Added a profiles.yml file for the Github Action to use to connect to BigQuery
    • Added a default profile in dbt_project.yml
    • Added conditional logic in base_select.sql and src_ga4.yml so environment variables are used when the source_project and property_ids variables are not set in dbt_project.yml
    • Added conditional logic to incremental models to look for incremental days variable from environment variables and then project variables
    • Added conditional logic to models that are enabled/disabled based on the declaration of project variables to check for environment variables in addition to project variables
    • Added conditional logic to base_ga4__events.sql so it looks for start date variable from both project and environment variables
    • Added comments in README.md related to the unit tests in the package
  • Add conditional logic in base_ga4__events that checks whether the --empty flag was passed in the dbt run/build command. Without this, the model fails due to an error because a limit 0 is added to from {{ source('ga4', 'events') }} and the _table_suffix field is not recognized.

Notes

  • I've commented out the unit test in stg_ga4__sessions_traffic_sources_last_non_direct_daily.yml due to this bug which is preventing migrating this pytest unit test to a dbt unit test. For the same reason, I've kept the pytest unit test associated with this model.
  • Evidence of the added run_dbt_unit_tests Github Action can be found here. I had to create a draft PR not from a fork in order to test the changes made in run_unit_tests_on_pr.yml.

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have run dbt test and python -m pytest . to validate existing tests

adamribaudo-velir and others added 29 commits June 7, 2024 07:18
@davidbooke4 davidbooke4 marked this pull request as ready for review October 24, 2024 21:25
@adamribaudo-velir
Copy link
Collaborator

adamribaudo-velir commented Oct 24, 2024

@davidbooke4 amazing! Just 3 things:

  • can you remove the unit_tests folder with the legacy pytest stuff if you feel it's no longer necessary?
  • I'm a little concerned about storing ga4_source_categories.csv in both the seeds folder and the fixtures folder . There's no way to let the unit tests reference it from the seeds folder?
  • Can you update https://github.com/Velir/dbt-ga4/blob/main/.github/workflows/run_unit_tests_on_pr.yml to run the unit tests rather than pytests?

@davidbooke4
Copy link
Collaborator Author

davidbooke4 commented Oct 25, 2024

@adamribaudo-velir I left the unit tests folder since I wasn't able to migrate the sessions_traffic_sources_last_non_direct_daily pytest unit test to a dbt unit test because of this issue that you opened a few months back.
Do you think not moving that one test justifies keeping the old unit_tests folder, or are we okay to still remove it and not have a unit test for that model for the time being until a fix is implemented for the bug?

@adamribaudo-velir
Copy link
Collaborator

@adamribaudo-velir I left the unit tests folder since I wasn't able to migrate the sessions_traffic_sources_last_non_direct_daily pytest unit test to a dbt unit test because of this issue that you opened a few months back. Do you think not moving that one test justifies keeping the old unit_tests folder, or are we okay to still remove it and not have a unit test for that model for the time being until a fix is implemented for the bug?

Oh good call. No harm in leaving it until that bug is resolved. We'll just want to make sure that both this Pytest + the unit tests are running during CI

@davidbooke4
Copy link
Collaborator Author

@adamribaudo-velir I left the unit tests folder since I wasn't able to migrate the sessions_traffic_sources_last_non_direct_daily pytest unit test to a dbt unit test because of this issue that you opened a few months back. Do you think not moving that one test justifies keeping the old unit_tests folder, or are we okay to still remove it and not have a unit test for that model for the time being until a fix is implemented for the bug?

Oh good call. No harm in leaving it until that bug is resolved. We'll just want to make sure that both this Pytest + the unit tests are running during CI

Alright, this is all set for another look whenever you get the chance!

README.md Outdated
@@ -302,9 +308,35 @@ The easiest option is using OAuth with your Google Account. Summarized instructi
```
gcloud auth application-default login --scopes=https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/iam.test
```

The `profiles.yml` file included in this package should be removed. The `profile: 'default'` line in `dbt_project.yml` in this package should also be removed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidbooke4 I understand that these files aren't needed by a package user, but what is the benefit of removing them? They'll be added back the next time the person runs dbt-deps, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right - I forgot that would be the case! I'll update the comments I made in the README based on that fact.

One consequence of defining some of these project variables in dbt_project.yml within the package is that these variables will carry through to someone's dbt project if they've installed dbt-ga4 but haven't set new values for these variables in their own dbt_project.yml. So models such as stg_ga4__page_conversions and stg_ga4__derived_session_properties will be enabled and have fields created based on the variables I've added to dbt_project.yml in the package.

I'm trying to do some exploring to see if there's an alternative, but what are your thoughts on that? Do you think that'd be okay or do we need to find a way so these models aren't enabled if someone doesn't set those variables?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot, yea that's a problem because I don't think many package users use those advanced features or set those variables.

Can you look at alternatives?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an alternative that boils down to creating more repo environment variables and updating the conditional enabled/disabled logic in some models to look for those environment variables.

However, now I'm back to an earlier problem where the project is unable to compile because unit tests are defined for models that are disabled. There's an open issue and PR related to this (with commits as recent as last week), so a fix should be in place soon 🤞.

What are your thoughts on waiting for this fix to be in place before merging this PR? The alternative would be to remove the dbt unit tests and add the pytest tests back in for the few models that are enabled based on setting the conversion events and derived user/session properties variables. I'd add in a TODO for replacing those pytest tests with unit tests if we decide to proceed with that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @adamribaudo-velir! I wanted to give the update that the PR I mentioned in my comment above was merged last week. That means we're probably at least a couple weeks out before it's included in a dbt version release, but I don't think there's any rush to get my PR merged so that should be fine. Let me know what you think though!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Initially replied in the main thread)

Sounds great! Yes, let's wait until this is released in dbt-core. Thanks for staying on top of it.

@davidbooke4
Copy link
Collaborator Author

@adamribaudo-velir I also wanted to call out that these unit tests will run when users install this package and execute a dbt run or dbt build. I was thinking of adding a note in the README about including the --exclude-resource-type unit_test flag in whatever folks are using to orchestrate and execute dbt, but I wanted to get your thoughts on it as well!

@dgitis
Copy link
Collaborator

dgitis commented Nov 5, 2024

Thank you both for all of your work on this. I've been watching the conversation for awhile.

I just tried running this and it seems that you get a "Parsing Error" if you don't have the stg_ga4__derived _user_properties model enabled.

This model only gets enabled when you have derived_user_properties set.

{{ config(
  enabled = true if var('derived_user_properties', false) or env_var('GA4_DERIVED_USER_PROPERTIES', false) else false,
  materialized = "table"
) }}

I think I should maybe report this as a dbt issue. I've had a number of issues where package YML cause issues when you override at the project level. Introducing this new testing mechanism just makes them worse.

I'm thinking as a workaround, we could maybe create an empty View when the variable is not set rather than set enabled = false.

@davidbooke4
Copy link
Collaborator Author

davidbooke4 commented Nov 5, 2024

Thank you both for all of your work on this. I've been watching the conversation for awhile.

I just tried running this and it seems that you get a "Parsing Error" if you don't have the stg_ga4__derived _user_properties model enabled.

This model only gets enabled when you have derived_user_properties set.

{{ config(
  enabled = true if var('derived_user_properties', false) or env_var('GA4_DERIVED_USER_PROPERTIES', false) else false,
  materialized = "table"
) }}

I think I should maybe report this as a dbt issue. I've had a number of issues where package YML cause issues when you override at the project level. Introducing this new testing mechanism just makes them worse.

I'm thinking as a workaround, we could maybe create an empty View when the variable is not set rather than set enabled = false.

Thanks for looking this over @dgitis!

There's this open issue and PR related to disabling unit tests when their corresponding models are disabled. I was able to compile a project in which I installed this version of the dbt-ga4 package and then set the derived_session_properties, derived_user_properties, and conversion_events variables, so I don't think there's a problem with project variables overriding package variables.

I'm hopeful that the PR will be resolved soon and this problem will be fixed, but materializing the model as an empty view is an alternative I hadn't considered. I was debating keeping the pytest tests in place for these few models that are enabled based on project variables and then moving them to dbt unit tests once that issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants