-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a new materialization insert_by_period #190
Comments
@svdimchenko yes, I believe could be still needed. |
@svdimchenko for more context on my last message see this one #471 |
You should support also I am thinking of cases you would need to deal with rows that get updated and developer wants to use this config |
@mycaule why unique_key should included in insert_by_period? Insert by period by definition acts on time, and it delete insert batches of periods. If you need to do upsert by unique_key maybe you consider to use an incremental materialization with merge strategy and of course unique_key. |
Using incremental fails for large tables as you discussed in #471 And |
Which snippet for redshift? Do you mean this #471 (comment) ? if so I don't see any reference to unique_key. Could you please add the reference to the snippet that you are talking about? Also:
that is incomplete, that is your default Also few challenges:
In order to understand more what you want to achieve I need more context to understand what you plan to build. Also force_batch don't help you if you are not using partitioned datasets, and if your initial query is big, it will still fail, because it will try to process the all datasets - that's where insert_by_period comes in handy. |
Thank you for the details I hadn't this in mind. This snippet talks about Redshift and adapting to Athena: https://gist.github.com/jessedobbelaere/6fdb593f9e2cc732e9f142c56c9bac87 For the moment I use the default I am trying to materialize a very large table through a federated query to RDS, and I want to Athena table to be refreshed daily in my datalake. Existing rows in RDS can change over time, I have |
Yes, insert_by_period is adapted from other adapter (redshift in this case), so seems like you were rerefing to the gist, not directly #471
do you expect to have updated_at to change? what you will generally do with an incremental strategy is to pick only records that have a change (means updated_at MUST change) to let this working, using is_incremental macro. Now the challenge is that:
|
I also had different finite status columns describing the lifecycle of the rows: To deal with tables without lifecycle columns, I also tried to create artificial partitions using Do you think I can do |
💡 Last night, dbt made this announcement about a new incremental-strategy for time-series event data. It introduces |
@jessedobbelaere we should indeed consider to support the microbatch in dbt-athena. |
Exactly 👍 I've created #715 as placeholder |
Hi folks - our team is already looking at supporting our new Microbatch strategy on this adapter so for now, we are going to go ahead and close out this issue since this will address this issue :) |
Inspired by https://gist.github.com/jessedobbelaere/6fdb593f9e2cc732e9f142c56c9bac87
create a new materialisation: insert_by_period.
Requirements
delete from table period={period_to_process}
to avoid duplicates in case some error occurThe text was updated successfully, but these errors were encountered: