Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add session_traffic_source_last_click fields #348

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adamribaudo-velir
Copy link
Collaborator

Description & motivation

Google recently released a new set of fields, session_traffic_source_last_click, documented as:

The session_traffic_source_last_click RECORD contains the last-click attributed session traffic source data across Google ads and manual contexts, where available.

These fields have been added to the base package model.

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have run dbt test and python -m pytest . to validate existing tests

@adamribaudo-velir
Copy link
Collaborator Author

@dgitis I ran this and noticed differences between our stg_ga4__sessions_traffic_sources_last_non_direct_daily model and Google's last_click fields. When I looked at the raw events, it looked like our model was more accurate which was weird.

I think it will only confuse people to have 2 definitions of 'session last click attribution' in the package. I suppose we should just include these fields and remove stg_ga4__sessions_traffic_sources_last_non_direct_daily just thought I'd check with you first.

@dgitis
Copy link
Collaborator

dgitis commented Oct 28, 2024

The advantage of our method over Google's is that we don't decouple the attribution fields where GA4 does.

So, if you see source / medium / campaign from sessions in this order from earliest to latest:

facebook / paid_social / my_fb_campaign
google / organic
direct

The last, non direct source / medium / campaign in GA4 would be google / organic / my_fb_campaign while the package would return google / organic / null.

As with my comment on the other PR, should we maybe make this configurable?

The advantages of using Google's definitions are as follows:

  • data will match closer
  • cheaper to calculate

While the advantage of our definitions are as follows:

  • better data
  • upgrading package version won't error

I'm thinking that we create a use_google_attribution_fields variable.

At this stage, the new variable only enables these fields in the base model, but in the future we could modify our various attribution models to detect this variable and return vastly different SQL depending on the configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants