Versatile Data Kit 1.5
Major features include:
Control Service
Data Job Configuration Persistence feature improvements
Adding the next level improvement over the pre-alpha version of the feature, including: GraphQL read data from DB, documentation improvements and improved test coverage.
vdk-structlog: Log Plugin
Adding improvements for the VDK Structs logs plugin and preparation for final release.
vdk-datasources: Data sources POC
Adding Data sources initial PoC version which includes:
- Data Source APIs handling sources, streams and state
- New Data Source is implemented by implementing IDataSource, IDataSourceConfiguraiton and IDataSourceStream
- Data Source connection management partialy
- Data Source Ingester that reads from data sources and writes to existing IIngeser
- An example data source AutoGeneratedDataSource
- An example job in the function test suite
vdk-oracle: Create oracle plugin
Adding pre-alpha VDK support for connecting and ingesting to an Oracle DB. For further usage details consult the VDK Oracle Plugin readme.
vdk-jupyter: Add alpha support for Jupyter Nodebooks
Adding full alpha support for VDK Jupyter integration.
How to get started?
We have prepared a few guides How to Create a Data Job With VDK Notebook, How To Develop a Data Job With VDK Notebook,
How to Convert a Data Job with VDK Notebook and How to Deploy a Data Job with VDK Notebook to help with your Jupyter journey.
What's Changed
- control-plane: remove needless step in docker build. by @murphp15 in #2947
- control-service: Add GraphQL read from DB by @doks5 in #2837
- control-service: add MeterRegistry counters for DataJobsSynchronizer by @mrMoZ1 in #2844
- control-service: add pod disruption budget by @dakodakov in #2882
- control-service: add resource constraints by @dakodakov in #2915
- control-service: add support for pymssql by @mivanov1988 in #2908
- control-service: deployment cannot be suspended by @mivanov1988 in #2941
- control-service: fix deployment resources by @mivanov1988 in #2955
- control-service: fix pod disruption budget template by @dakodakov in #2885
- control-service: force aws cred provider refresh by @mrMoZ1 in #2879
- control-service: ingress allow for multiple hosts by @mivanov1988 in #2911
- control-service: integration test for async job deploy by @mrMoZ1 in #2829
- control-service: make new release of job builder images by @murphp15 in #2950
- control-service: make timeout configurable by @murphp15 in #2951
- control-service: reduce logging by @mivanov1988 in #2857
- control-service: refactor service user doc by @mrMoZ1 in #2436
- control-service: unit tests for data job persistence classes by @mrMoZ1 in #2935
- control-service: update ingress by @mivanov1988 in #2853
- support: update the ci notification by @DeltaMichael in #2877
- vdk-core: add datetime and bytes to decimal json encoder by @DeltaMichael in #2924
- vdk-core: add logging plugin warning and check if the vdk-structlog plugin is used by @yonitoo in #2944
- vdk-core: create config option for logging execution result by @DeltaMichael in #2850
- vdk-core: ensure early logs are available by @antoniivanov in #2846
- vdk-core: fix bug in error classification by @DeltaMichael in #2840
- vdk-core: fix exception cause swallowing by @DeltaMichael in #2949
- vdk-core: handle fetchall errors for oracledb by @DeltaMichael in #2917
- vdk-core: implement config option for logging execution result by @DeltaMichael in #2831
- vdk-core: ingest logging formatting bug by @antoniivanov in #2836
- vdk-core: remove redundant logs by @DeltaMichael in #2841
- vdk-core: VdkBoundLogger by @gageorgiev in #2823
- vdk-data-source-git: data source for git POC by @antoniivanov in #2859
- vdk-data-sources: add sources command by @antoniivanov in #2864
- vdk-data-sources: address review comments by @antoniivanov in #2865
- vdk-datasources: data sources POC by @antoniivanov in #2805
- vdk-duckdb: fix ingestion by @antoniivanov in #2843
- vdk-events: add explore23 to events by @duyguHsnHsn in #2873
- vdk-events: Add ingest and anonymize workshop by @antoniivanov in #2833
- vdk-events: improve Productionizing Jupyter Notebooks README by @duyguHsnHsn in #2896
- vdk-events: update Ingest and Anonymize workshop by @antoniivanov in #2891
- vdk-huggingface: add new ingest plugin by @antoniivanov in #2858
- vdk-impala: enhance memory error handling by @dakodakov in #2938
- vdk-impala: uncomment tests that were not passing due to core change by @DeltaMichael in #2845
- vdk-ipython: add support for %%vdkingest by @antoniivanov in #2866
- vdk-jupyter: add retries by @murphp15 in #2957
- vdk-jupyter: fix bug for failed requests and improve error handling by @yonitoo in #2916
- vdk-jupyter: fix formatting issues by @yonitoo in #2890
- vdk-jupyter: fix skipped tests by @murphp15 in #2871
- vdk-jupyter: include test report by @murphp15 in #2876
- vdk-jupyter: introduce Task Runner, a polling mechanism that runs tasks in the background and tracks their status by @yonitoo in #2869
- vdk-jupyter: run tests in CI by @murphp15 in #2868
- vdk-jupyterlab-extensions: update dependencies by @murphp15 in #2863
- vdk-kerberos-auth: fix unit test failing from bad logging config by @murphp15 in #2923
- vdk-logging-format: Deprecate plugin by @gageorgiev in #2888
- vdk-notebook: Support for "%%vdkingest" cell type in Notebook Steps by @antoniivanov in #2867
- vdk-oracle: create oracle plugin by @DeltaMichael in #2927
- vdk-oracle: support type inference by @DeltaMichael in #2948
- vdk-plugins: Audit log statements by @gageorgiev in #2878
- vdk-singer: Singer.io plugin for data sources by @antoniivanov in #2821
- vdk-smarter: pin openai version to 0.28 by @yonitoo in #2886
- vdk-structlog: default formatter by @duyguHsnHsn in #2936
- vdk-structlog: fix filtering of metadata fields for json by @DeltaMichael in #2874
- vdk-structlog: LTSV formatting by @gageorgiev in #2887
- vdk-structlog: Test for compatibility for log_level_module propagation by @gageorgiev in #2922
- vdk-structlog: Tests by @gageorgiev in #2838
Full Changelog: v1.4...v1.5