[Feature] Create a flexible delete+insert incremental strategy without relying on primary/unique keys #10655

dlord · 2024-09-03T10:01:19Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The delete+insert incremental strategy creates a temporary table with the new dataset, then it executes a DELETE statement using the temporary table as the filter condition. Below is an example of the generated statement:

delete
from
  "source_db"."dbt"."source_table" using "source_table__dbt_tmp173111887980" 
where
  (
    "source_table__dbt_tmp173111887980".event_time = "source_db"."dbt"."source_table".event_time 
  );

The current approach uses the unique_key specified in the model, and it probably works fine for most relational databases. For data warehouses like Redshift, this approach is very slow due to lack of primary and unique keys. As a result, the delete operation becomes very slow due to sequential scanning.

In order to work around this, I created a custom incremental strategy for a customer using the approach described here. The goal was to be able to provide a flexible way to specify custom predicates that works well with the underlying data and storage. The custom incremental strategy I wrote uses the existing incremental_predicates to build the delete statement. The custom incremental strategy ignores the unique_key to keep the implementation simple, and easy to switch to other strategies.

CC: @goodjira and @schmiman

Describe alternatives you've considered

Currently, if the merge strategy is set to delete+insert but no unique_key was specified, it is as if you used an append strategy. This behavior can be seen in the macro default__get_delete_insert_merge_sql. I am not sure if this was a deliberate decision.

As such, an alternative approach to creating a separate incremental strategy is to modify the existing default__get_delete_insert_merge_sql to allow users to not specify a unique_key, and only rely on the incremental_predicates.

Who will this benefit?

Users who are working with data warehouses like Redshift. Date columns that are used as sort keys can significantly benefit from this.

Here is an example: consider a model in Redshift that was created with the following config:

{{ config(materialized="incremental",
            incremental_strategy='delete_insert_no_unique_key',
            incremental_predicates = [
                "DBT_INTERNAL_DEST.event_time >= (select min(event_time) from DBT_INTERNAL_SOURCE)"
            ],
) }}

The resulting delete SQL will be this:

delete
from
  "source_db"."dbt"."source_table"
where
  (
    "source_db"."dbt"."source_table".event_time >= (select min(event_time) from "source_table__dbt_tmp173111887980")
  );

The performance difference in Redshift compared to a delete join is huge. I had one model where deleting 10M records took 30 minutes using the join approach. With the custom delete+insert strategy, this was reduced to 10 seconds due to the optimizations we can do in incremental_predicates

Are you interested in contributing this feature?

Yes

Anything else?

No response

The text was updated successfully, but these errors were encountered:

dlord added enhancement New feature or request triage labels Sep 3, 2024

graciegoheen added the performance label Sep 4, 2024

dbeatty10 added the incremental Incremental modeling with dbt label Sep 4, 2024

dlord linked a pull request Sep 12, 2024 that will close this issue

Flexible delete+insert incremental strategy that purely relies on incremental_predicates dbt-labs/dbt-adapters#303

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Create a flexible delete+insert incremental strategy without relying on primary/unique keys #10655

[Feature] Create a flexible delete+insert incremental strategy without relying on primary/unique keys #10655

dlord commented Sep 3, 2024 •

edited

Loading

[Feature] Create a flexible delete+insert incremental strategy without relying on primary/unique keys #10655

[Feature] Create a flexible delete+insert incremental strategy without relying on primary/unique keys #10655

Comments

dlord commented Sep 3, 2024 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dlord commented Sep 3, 2024 •

edited

Loading