-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Decrease run-time duration of ODS-QLIK Process #28
Conversation
src/cubic_loader/qlik/ods_qlik.py
Outdated
""" | ||
find all available CDC dfm files for a Snapshot from Archive and Error buckets | ||
find all available CDC csv.gz files for a Snapshot from Archive bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why you're dropping Error bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped it when I was trying to simplify things, I can add it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you. But I think we should have it in there. The glue pipeline is unnecessarily strict so we shouldn't constrain this pipeline because of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to test the error bucket addition.
I checked it out locally, the only tricky thing is returning a list with everything sorted by the last timestamp of the filename. The |
This change alters the way ods-qlik data is loaded into the dmap-import database.
Previously the ods-qlik data loading steps where:
_history
table_fact
table updates.This approach has proven to not be scalable, as the size of the
_history
tables and complicated nature of the_fact
table query, has resulted in unacceptable long load times.The new process does the following:
_history
table_fact
table individually with dataframe objectThe new process completely removes the need for a complicated query to load data from
_history
table to the_fact
table and thus dramatically reduces loading time when running the ods-qlik process.The new process is also completely compatible with existing ETL status files.