Versatile Data Kit 1.4
Major features include:
Control Service
Complete Data Job Configuration Persistence (Pre-alpha)
The current two-step process of storing data job deployment configurations in both Kubernetes and a database leads to performance degradation, potential data loss, and complexity; optimizing storage by consistently keeping all essential properties in the database can enhance efficiency, system reliability, and user experience
Another important benefit would be to allow to track deployment status using the API.
vdk-structlog log plugin
The plugin allows users to configure logging metadata and logging format. It also works with bound loggers.
This plugin allows users to:
select the log output format
configure the logging metadata
display metadata added by bound loggers
See more in its documentation page
vdk-core Error handling changes
Deprecated error reporting patterns
![](https://private-user-images.githubusercontent.com/2536458/278037168-caf0f22d-5a15-48a7-a38b-892864c4ee0d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMTg2NDMsIm5iZiI6MTczOTMxODM0MywicGF0aCI6Ii8yNTM2NDU4LzI3ODAzNzE2OC1jYWYwZjIyZC01YTE1LTQ4YTctYTM4Yi04OTI4NjRjNGVlMGQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMjM1OTAzWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MzA1YzQ1ZGUyMTNiODU2YWI1ZGJhODY2NTBlZTE1YzY5NjRjZDUxYjRlZDM2MmIxODNiZjc4YjdjZTc4OTliZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.jZo9lIbOfM-6psMFS51wOI2grdAupACMa0jM77AoLaE)
Most vdk-core generic Exceptions replaced with Domain specific
![](https://private-user-images.githubusercontent.com/2536458/278037274-759bb748-cd3f-4ab7-be2a-a514c1cb3863.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMTg2NDMsIm5iZiI6MTczOTMxODM0MywicGF0aCI6Ii8yNTM2NDU4LzI3ODAzNzI3NC03NTliYjc0OC1jZDNmLTRhYjctYmUyYS1hNTE0YzFjYjM4NjMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMjM1OTAzWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OWVhNWViZTA4NDEyMDRmMjJiNzdiMDI4OTUzOWEyYTMzNWVkY2RlYWRlOWQ1ZTM0NGZjNTcyYzA5Yzg0YWVlZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.mPr7nH_PmFBVZxgn6jc8J-2rkEHLZzLqny1Kp9DiA3M)
Test exception propagation to user code
VDK stopped wrapping non-vdk errors in vdk errors. This should result in errors coming from libraries, templates, etc. being propagated to user code. Users should then be able to handle those errors. So now something like this should be easy:
def run(job_input: IJobInput):
args = dict()
try:
job_input.execute_template("csv-risky", args)
except pd.errors.EmptyDataError as e:
log.info("Handling empty data error")
log.exception(e)
What's Changed
- control-service: Enhance Exception Handling for DataJobsSynchronizer by @mivanov1988 in #2758
- control-service: [bug fix] Add freetype2 and libpng to secure builder by @doks5 in #2744
- control-service: add IT tests for async job deployment by @mivanov1988 in #2794
- control-service: add new deployment tables by @mivanov1988 in #2719
- control-service: asynchronous deployment deletion by @mivanov1988 in #2781
- control-service: data job synchronizer error handling by @mivanov1988 in #2742
- control-service: deployment controller reads from db by @mrMoZ1 in #2800
- control-service: depoyment controller writes deployment entity by @mrMoZ1 in #2731
- control-service: enable scheduled execution for data jobs' synchronizer by @mivanov1988 in #2771
- control-service: fix control service post deployment test by @mivanov1988 in #2790
- control-service: fix data job image building by @mivanov1988 in #2832
- control-service: fix infinite redeployment by @mivanov1988 in #2822
- control-service: fix post deployment test by @mrMoZ1 in #2815
- control-service: fix read deployment job version by @mivanov1988 in #2819
- control-service: handle deployment deletion in case of a job being deleted by @mivanov1988 in #2816
- control-service: implement multi-threading for synchronization process by @mivanov1988 in #2775
- control-service: improve async deployment logging by @mivanov1988 in #2826
- control-service: job resources validation on job deployment by @mivanov1988 in #2793
- control-service: reduce logging by @mivanov1988 in #2834
- control-service: resolve dependabot alert by @antoniivanov in #2751
- control-service: user-initiated deployment notifications by @mivanov1988 in #2757
- control-service: utilize new deployment tables by @mivanov1988 in #2714
- vdk-audit: Clean up some audit events by @doks5 in #2792
- vdk-control-cli: fix CI/CD tests by @yonitoo in #2782
- vdk-control-cli: pin werkzeug to version 2.3.8 or less by @DeltaMichael in #2743
- vdk-core: add error formatter configuration by @DeltaMichael in #2754
- vdk-core: create ingestion exceptions by @antoniivanov in #2752
- vdk-core: domain specific properties/secrets exceptions by @antoniivanov in #2770
- vdk-core: fix postgres and greenplum tests by @yonitoo in #2825
- vdk-core: move error classifying logic by @duyguHsnHsn in #2769
- vdk-core: pass exceptions from data job steps in results by @DeltaMichael in #2774
- vdk-core: remove code duplication in ingestion router by @antoniivanov in #2760
- vdk-core: simplify error message for send_**_for_ingestion by @antoniivanov in #2787
- vdk-core: test exception propagation to user code by @DeltaMichael in #2820
- vdk-core: test ingestion with multiple threads by @antoniivanov in #2796
- vdk-core: tests passing custom iterator to ingestion methods by @antoniivanov in #2761
- vdk-coverity: Adding Coverity Scan by @shanmathik in #2753
- vdk-dag, vdk-control-cli, airflow-provider-vdk: step using deprecated field by @antoniivanov in #2706
- vdk-dag: fix failing validation tests by @DeltaMichael in #2712
- vdk-heartbeat: Introduce additional sleep when checking deployments by @doks5 in #2824
- vdk-impala: add Out Of Memory error handling by @dakodakov in #2747
- vdk-impala: introduce new error handling by @duyguHsnHsn in #2759
- vdk-jupyter: Enable PYTHONUNBUFFERED to ensure correct log ordering by @gageorgiev in #2711
- vdk-jupyter: add Tutorial link in getting-started.ipynb by @antoniivanov in #2707
- vdk-jupyter: cicd fix by @duyguHsnHsn in #2780
- vdk-jupyter: fix bug in detecting run functions by @antoniivanov in #2721
- vdk-jupyter: fix ci/cd by @duyguHsnHsn in #2773
- vdk-jupyter: print summary output to temp dir by @antoniivanov in #2715
- vdk-jupyter: update getting started to incldue vdksql by @antoniivanov in #2713
- vdk-jupyter: use
vdksql
for SQL cells and steps by @antoniivanov in #2729 - vdk-notebook: [bug fix] ignore missing id field in cell by @doks5 in #2717
- vdk-notebook: remove obsolete code by @antoniivanov in #2716
- vdk-notebook: set summary file path as configuration by @antoniivanov in #2709
- vdk-plugins: add new error handling methods by @duyguHsnHsn in #2750
- vdk-structlog: create structured logging plugin by @DeltaMichael in #2801
- vdk-test-utils: make IngestIntoMemoryPlugin method configurable by @antoniivanov in #2783
New Contributors
- @shanmathik made their first contribution in #2753
Full Changelog: v1.3...v1.4