Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Merging in my enhancements to this TAP - Is this feasible - do you have time? #210

Open
s7clarke10 opened this issue Mar 20, 2023 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@s7clarke10
Copy link

Hi,

I have a fork of this tap and have continued to enhance the tap to include additional features that we need. We also supported switching off the discovery of the data types and just making all the extracted fields strings. Further we have added other features like support for BOM, excluding going through proxy servers for Private S3 bucket access, specifying the encoding of the file etc.

Given your recent changes to the tap, I'm not sure how feasible it is to push through some of these changes and so wanted your thoughts on this? I have limited time to push these changes through and know I had challenges with my last pull request because of the ci/cd and testing with the buckets.

Here are some of our recent changes in this fork https://github.com/s7clarke10/pipelinewise-tap-s3-csv .

Some of these enhancements needed to be made in conjunction with the singer encodings https://github.com/s7clarke10/singer-encodings enhancements.

Would appreciate your thoughts on this.

2.0.8 (2022-12-22)

Changes

  • Providing an optional set_empty_values_null setting. When set true will emit null (the JSON equivalent of None) instead of an empty string.

2.0.7 (2022-11-01)

Changes

  • Providing an optional s3_proxies dict config to set the use of a proxy server. Set to {} to avoid using a proxy server for s3 traffic.

2.0.6 (2022-10-05)

Changes

  • Bump boto3 from 1.23.10 to 1.24.26
  • Bump ujson from 5.2.0 to 5.4.0 because of vunerabilities

2.0.5 (2022-10-04)

The tap-s3-csv enhancements deal with scenarios where the csv files are not loading correctly due to various quality issues or assumption about the data being read e.g. data-types.

Changes

  • Allows strings to be overridden to have a string data-type regardless of what has been discovered
  • Supports the reading of UTF-8-BOM (Byte Order) - Microsoft saved csv files
  • Support a suffix being added to streams / tables to make them unique e.g. a date or provider_id
  • Provides option to warn rather error if a file isn't discovered for the search criteria
  • Support the ability to remove a character from the csv file being read e.g. strip out all double-quotes.
@s7clarke10 s7clarke10 added the help wanted Extra attention is needed label Mar 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant