Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple output files - SFTP #67

Open
shbadawy opened this issue Mar 29, 2023 · 3 comments
Open

Multiple output files - SFTP #67

shbadawy opened this issue Mar 29, 2023 · 3 comments

Comments

@shbadawy
Copy link

Hello,

I am trying to get data from Redshift / S3 using their input plugins, to SFTP server as CSV using SFTP output plugin. The output is always divided into 4 sequential files.

For example, if my data size is 4MB I get 4 files 1MB each ( 0_test.csv, 1_test.csv, 2_test.csv, 3_test.csv)

Is there a way to get them into one file?

Thanks

@shbadawy shbadawy changed the title Multiple output files Multiple output files - SFTP Mar 29, 2023
@shbadawy
Copy link
Author

@morihaya
Copy link

Hi @shbadawy Try the following exec settings It may solve the problem.

exec:
  max_threads: 1
  min_output_tasks: 1

in:
  type: something
  ...

out:
  type: sftp
  ...

This document may also be helpful.
https://www.embulk.org/docs/built-in.html

The min_output_tasks option enables “page scattering”. The feature is enabled if number of input tasks is less than min_output_tasks. It uses multiple filter & output threads for each input task so that one input task can use multiple threads. Setting larger number here is useful if embulk doesn’t use multi-threading with enough concurrency due to too few number of input tasks. Setting 1 here disables page scattering completely.

@shbadawy
Copy link
Author

Thanks @morihaya for sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants