Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Unified version 0.4.4 #945

Merged
merged 3 commits into from
Jun 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: "Running the models on data lakehouses"
sidebar_position: 50
description: "How to run our models on lakehouses"
---

```mdx-code-block
import Badges from '@site/src/components/Badges';
```

<Badges badgeType="Early Release"></Badges>&nbsp;

:::danger

Running the models on data lakes or lakehouses (using external tables in a warehouse to read directly from a lake) is currently in Early Release state and is not fully supported. Certain features may not work as expected and errors are more likely to occur. Please use this approach at your own risk and raise any issues you find with us.

:::

If you are using the [lake loaders](/docs/storing-querying/storage-options/index.md#data-lake-loaders) to load your data into a lake storage option, it may be possible to use our data models. In general in this section of the docs we are not going to detail which warehouses support which file formats, or how to set up the respective tables in each warehouse - please see the docs for your appropriate warehouse to see what file formats they support.

# Databricks
At time of writing, `delta` is the preferred file format for Databricks [external tables](https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html). If you create an external table from this lake format in Databricks, you should be able to run the models without any further changes required by simply pointing the model at this table.

# Snowflake
At time of writing, `Iceberg` is the preferred file format for Snowflake [iceberg tables](https://docs.snowflake.com/en/user-guide/tables-iceberg). If you wish to use our models with this, currently only the [Unified Digital](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-unified-data-model/index.md) package supports this, by setting the `snowplow__snowflake_lakeloader` variable to `true`.

Note that compared to the other loaders for Snowflake, that field names in Self-describing events and Entities are converted to `snake_case` format (the other loaders retain the format used in the schema, often `camelCase`). You will need to adjust other variables and inputs accordingly compared to what you may find in the docs.

# Spark
Currently using spark directly as a compute engine is not supported for our packages.

# Redshift (spectrum)
Currently using Redshift Spectrum tables is not supported for our packages due to [limitations](https://docs.aws.amazon.com/redshift/latest/dg/nested-data-restrictions.html) with the platform.

# BigQuery on GCS
Currently using GCS/BigQuery external tables is not tested but may work, please let us know your experience if you try this.
2 changes: 1 addition & 1 deletion src/componentVersions.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ export const versions = {
// Data Modelling
// dbt
dbtSnowplowAttribution: '0.2.2',
dbtSnowplowUnified: '0.4.3',
dbtSnowplowUnified: '0.4.4',
dbtSnowplowWeb: '1.0.1',
dbtSnowplowMobile: '1.0.0',
dbtSnowplowUtils: '0.16.7',
Expand Down
Loading