Release Unified version 0.4.4 (#945)

* [create-pull-request] automated change * Add new var and page on lake loaders * Add early release badge --------- Co-authored-by: rlh1994 <[email protected]> Co-authored-by: Ryan Hill <[email protected]>
snowplow · Jun 26, 2024 · 4a64850 · 4a64850
1 parent f0b0de9
commit 4a64850
Show file tree

Hide file tree

Showing 3 changed files with 989 additions and 1 deletion.
diff --git a/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-operation/lakes/index.md b/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-operation/lakes/index.md
@@ -0,0 +1,36 @@
+---
+title: "Running the models on data lakehouses"
+sidebar_position: 50
+description: "How to run our models on lakehouses"
+---
+
+```mdx-code-block
+import Badges from '@site/src/components/Badges';
+```
+
+<Badges badgeType="Early Release"></Badges>&nbsp;
+
+:::danger
+
+Running the models on data lakes or lakehouses (using external tables in a warehouse to read directly from a lake) is currently in Early Release state and is not fully supported. Certain features may not work as expected and errors are more likely to occur. Please use this approach at your own risk and raise any issues you find with us.
+
+:::
+
+If you are using the [lake loaders](/docs/storing-querying/storage-options/index.md#data-lake-loaders) to load your data into a lake storage option, it may be possible to use our data models. In general in this section of the docs we are not going to detail which warehouses support which file formats, or how to set up the respective tables in each warehouse - please see the docs for your appropriate warehouse to see what file formats they support.
+
+# Databricks
+At time of writing, `delta` is the preferred file format for Databricks [external tables](https://docs.databricks.com/en/sql/language-manual/sql-ref-external-tables.html). If you create an external table from this lake format in Databricks, you should be able to run the models without any further changes required by simply pointing the model at this table.
+
+# Snowflake
+At time of writing, `Iceberg` is the preferred file format for Snowflake [iceberg tables](https://docs.snowflake.com/en/user-guide/tables-iceberg). If you wish to use our models with this, currently only the [Unified Digital](/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-models/dbt-unified-data-model/index.md) package supports this, by setting the `snowplow__snowflake_lakeloader` variable to `true`. 
+
+Note that compared to the other loaders for Snowflake, that field names in Self-describing events and Entities are converted to `snake_case` format (the other loaders retain the format used in the schema, often `camelCase`). You will need to adjust other variables and inputs accordingly compared to what you may find in the docs. 
+
+# Spark
+Currently using spark directly as a compute engine is not supported for our packages.
+
+# Redshift (spectrum)
+Currently using Redshift Spectrum tables is not supported for our packages due to [limitations](https://docs.aws.amazon.com/redshift/latest/dg/nested-data-restrictions.html) with the platform.
+
+# BigQuery on GCS
+Currently using GCS/BigQuery external tables is not tested but may work, please let us know your experience if you try this.
diff --git a/src/componentVersions.js b/src/componentVersions.js
@@ -42,7 +42,7 @@ export const versions = {
   // Data Modelling
   // dbt
   dbtSnowplowAttribution: '0.2.2',
-  dbtSnowplowUnified: '0.4.3',
+  dbtSnowplowUnified: '0.4.4',
   dbtSnowplowWeb: '1.0.1',
   dbtSnowplowMobile: '1.0.0',
   dbtSnowplowUtils: '0.16.7',