Skip to content

Commit

Permalink
Release Unified version 0.4.4 (#945)
Browse files Browse the repository at this point in the history
* [create-pull-request] automated change

* Add new var and page on lake loaders

* Add early release badge

---------

Co-authored-by: rlh1994 <[email protected]>
Co-authored-by: Ryan Hill <[email protected]>
  • Loading branch information
3 people authored Jun 26, 2024
1 parent f0b0de9 commit 4a64850
Show file tree
Hide file tree
Showing 3 changed files with 989 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: "Running the models on data lakehouses"
sidebar_position: 50
description: "How to run our models on lakehouses"
---

```mdx-code-block
import Badges from '@site/src/components/Badges';
```

<Badges badgeType="Early Release"></Badges>&nbsp;

:::danger

Running the models on data lakes or lakehouses (using external tables in a warehouse to read directly from a lake) is currently in Early Release state and is not fully supported. Certain features may not work as expected and errors are more likely to occur. Please use this approach at your own risk and raise any issues you find with us.

:::

If you are using the [lake loaders](/docs/storing-querying/storage-options/index.md#data-lake-loaders) to load your data into a lake storage option, it may be possible to use our data models. In general in this section of the docs we are not going to detail which warehouses support which file formats, or how to set up the respective tables in each warehouse - please see the docs for your appropriate warehouse to see what file formats they support.

# Databricks
At time of writing, `delta` is the preferred file format for Databricks [external tables](https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html). If you create an external table from this lake format in Databricks, you should be able to run the models without any further changes required by simply pointing the model at this table.

# Snowflake
At time of writing, `Iceberg` is the preferred file format for Snowflake [iceberg tables](https://docs.snowflake.com/en/user-guide/tables-iceberg). If you wish to use our models with this, currently only the [Unified Digital](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-unified-data-model/index.md) package supports this, by setting the `snowplow__snowflake_lakeloader` variable to `true`.

Note that compared to the other loaders for Snowflake, that field names in Self-describing events and Entities are converted to `snake_case` format (the other loaders retain the format used in the schema, often `camelCase`). You will need to adjust other variables and inputs accordingly compared to what you may find in the docs.

# Spark
Currently using spark directly as a compute engine is not supported for our packages.

# Redshift (spectrum)
Currently using Redshift Spectrum tables is not supported for our packages due to [limitations](https://docs.aws.amazon.com/redshift/latest/dg/nested-data-restrictions.html) with the platform.

# BigQuery on GCS
Currently using GCS/BigQuery external tables is not tested but may work, please let us know your experience if you try this.
2 changes: 1 addition & 1 deletion src/componentVersions.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ export const versions = {
// Data Modelling
// dbt
dbtSnowplowAttribution: '0.2.2',
dbtSnowplowUnified: '0.4.3',
dbtSnowplowUnified: '0.4.4',
dbtSnowplowWeb: '1.0.1',
dbtSnowplowMobile: '1.0.0',
dbtSnowplowUtils: '0.16.7',
Expand Down
Loading

0 comments on commit 4a64850

Please sign in to comment.