Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support safe concurrent dbt runs #420

Open
BentsiLeviav opened this issue Feb 13, 2025 · 2 comments
Open

Support safe concurrent dbt runs #420

BentsiLeviav opened this issue Feb 13, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@BentsiLeviav
Copy link
Contributor

Support safe concurrent dbt runs

Description

There have been multiple issues related to concurrent dbt runs in ClickHouse, where one run attempts to query an intermediate/temp table that has already been dropped by a previous run.

Several PRs have attempted to fix this by appending invocation_id or UUIDs to table names or by implementing conditional drops based on specific criteria. However, each change was applied to different parts of the codebase based on individual use cases, resulting in an inconsistent and partial solution.

Proposed Solution

Introduce dedicated macros, clickhouse__make_intermediate_relation and clickhouse__make_temp_relation, that would:

  1. Include a prefix parameter with a default value using invocation_id, ensuring unique table names for each dbt run.
  2. Introduce a feature flag (disabled by default) that allows users to opt into this behavior. When enabled, invocation_id will be automatically appended to intermediate and temp table names.
  3. Standardize the handling of temp and intermediate relations across the dbt-clickhouse adapter, avoiding ad-hoc solutions in different parts of the code.

✅ Benefits

  • Prevents table name collisions in parallel dbt runs.
  • Ensures a consistent approach to naming intermediate and temp tables.
  • Provides an opt-in feature flag for backward compatibility.
  • Reduces the need for future patches addressing the same issue in different ways.

❌ Potential Downsides

  • Increased risk of hanging tables:
    • Currently, if a temp/intermediate table is not dropped, it is often cleaned up in the next run. With this change, the risk of orphaned tables may increase.
    • Potential solution: Implement a cleanup mechanism that drops stale intermediate/temp tables based on their creation time or last usage.

This is not a heavy lift, but I Would love to get the community's opinion on it.

@BentsiLeviav
Copy link
Contributor Author

Related PRs #373 #365 #353

@stephen-up
Copy link

stephen-up commented Feb 13, 2025

Hi, I think this is a good idea and would help my situation. 2 things to think about. 1 outstanding problem.

To aid cleanup of hanging tables we could, the temporary tables could use timestamps rather than invocation_id. It could be the dbt runs start time. That way a cleanup process can more easily scan for table names matching the pattern where the timestamp is older than 1 day etc.
eg my_table_tmp_1739484798591 with a epoch on the end.

Re, naming with prefixes. Im not sure exactly what you mean but suffixes might be better than prefixes.
Prefixes like below .. dont sort as well as suffixes. Also the adapter is already using suffixes for __dbt_tmp tables.

  • ab2344_my_first_table
  • cd12345_my_second_table
  • xy142_my_first_table

vs

  • my_first_table_ab2344
  • my_first_table_xy142
  • my_second_table_cd12345

Out of order runs
Out of order model runs and their completions could be a problem.
Consider a race condition where there are 2 attempts to refresh the same table with different versions of code. If the first run takes longer to complete than the second, when the first completes it will run the exchange tables between its temporary table and the final table and roll back the change.

Currently, these concurrent refreshes would each try and delete the __dbt_tmp table. The second run would kill the first runs attempt. The first run would get an error about the table its writing to not existing anymore. So by trying to support concurrent runs we might be making the impact of race conditions worse.

I dont have a great solution for this that fits well with existing components, I dont think solving this needs to be a blocker. But worth considering as a downside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants