Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Normalize version 0.4.0 #1081

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

github-actions[bot]
Copy link
Contributor

This pull request is automatically generated to release version 0.4.0 of package Normalize.

@snowplowcla
Copy link

Thanks for your pull request. Is this your first contribution to a Snowplow open source project? Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://docs.snowplowanalytics.com/docs/contributing/contributor-license-agreement/ to learn more and sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

Copy link

netlify bot commented Nov 18, 2024

Deploy Preview for snowplow-docs ready!

Name Link
🔨 Latest commit 731e2a4
🔍 Latest deploy log https://app.netlify.com/sites/snowplow-docs/deploys/673b7ce3d541c90008ac4e26
😎 Deploy Preview https://deploy-preview-1081--snowplow-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

"description": "Determines which timestamp is used to process sessions of data"
},
"snowplow__partition_tstamp": {
"recommendFullRefresh": false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we don't recommend a full refresh when changed?

**The package source code can be found in the [snowplow/dbt-snowplow-normalize repo](https://github.com/snowplow/dbt-snowplow-normalize ), and the docs for the [macro design are here](https://snowplow.github.io/dbt-snowplow-normalize/#/overview/snowplow_normalize ).**
**The package source code can be found in the [snowplow/dbt-snowplow-normalize repo](https://github.com/snowplow/dbt-snowplow-normalize), and the docs for the [macro design are here](https://snowplow.github.io/dbt-snowplow-normalize/#/overview/snowplow_normalize).**

## Package Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block fits in the quick start guide rather, as an optional config that they might want to consider when setting this up, not a core component that we need to start our main package description with I think.


The package provides [macros](https://docs.getdbt.com/docs/build/jinja-macros) and a python script that is used to generate your normalized events, filtered events, and users table for use within downstream ETL tools such as Census. See the [Model Design](#model-design) section for further details on these tables.

The package only includes the base incremental scratch model and does not have any derived models, instead it generates models in your project as if they were custom models you had built on top of the [Snowplow incremental tables](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md), using the `_this_run` table as the base for new events to process each run. See the [configuration](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-configuration/index.md) section for the variables that apply to the incremental model.

:::note
The incremental model is simplified compared to the standard unified model, this package does not use sessions to identify which historic events to reprocess and just uses the `collector_tstamp` and package variables to identify which events to (re)process.

The incremental model is simplified compared to the standard unified model, this package does not use sessions to identify which historic events to reprocess and just uses the `snowplow__partition_tstamp` (defaults to `collector_tstamp`) and package variables to identify which events to (re)process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the snowplow__session_timestamp for that!

@@ -54,7 +84,7 @@ For each `event_names` listed, a model is generated for records with matching ev
For example, if you have 3 `event_names` listed as `['page_view']`, `['page_ping']`, and `['link_click', 'deep_link_click']` then 3 models will be generated, each containing only those respective events from the atomic events table.

### Filtered Events Model
A single model is built that provides `event_id`, `collector_tstamp` and the name of the Normalized Event Model that the event was processed into, it does not include records for events that were not of an event type in your configuration. The model file itself is a series of `UNION` statements.
A single model is built that provides `event_id`, `snowplow__partition_tstamp` (defaults to `collector_tstamp`), and the name of the Normalized Event Model that the event was processed into, it does not include records for events that were not of an event type in your configuration. The model file itself is a series of `UNION` statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they change it, it will still provide collector_tsamp, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants