Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to display CSV content in stdout #8

Open
fbaligand opened this issue Oct 22, 2020 · 6 comments
Open

Add an option to display CSV content in stdout #8

fbaligand opened this issue Oct 22, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@fbaligand
Copy link

Like wget "-0" option, it would be nice to be able to generate CSV content in standard output (console), rather than in a file.

@fabiopipitone fabiopipitone added the enhancement New feature or request label Oct 22, 2020
@fabiopipitone
Copy link
Owner

Didn't get this one. The tool is mainly designed to export massive amount of data to a csv file. To do such a thing without causing any OOM error, a fairly complex system of partial csv files writing is adopted. It'd be useless to write the data to the stdout once it's already been written to a csv file. Also, I don't think it'd be very useful to see the data been printed out on the stdout while scrolling since you can use Logstash for that purpose. Did I correctly get what you meant?

@fbaligand
Copy link
Author

The idea is to chain output with a piped command.
Like elasticsearch-tocsv ... | gzip ... or elasticsearch-tocsv ... | nc $HOST $PORT

But I fully understand that given the way this tool is designed, it is incompatible with such on option.

@fabiopipitone
Copy link
Owner

I could do it but I'd rather not make any assumption about what the user has installed on his system. Maybe gzip or other libraries are not available and this can kill the script. After all, I still would need to generate the whole CSV file before zipping it or sending it to stdout and the user might as well just chain the script output as you did in your example, exploiting tools he knows that are installed on his machine for sure.

@fbaligand
Copy link
Author

To make it clear, in my mind, it's not elasticsearch-tocsv job to call gzip or nc.
It is user that chains elasticsearch-tocsv and other commands with a pipe.
But this can be done only if output is generated to stdout.
A simple way to do it would be to have only one process that generates output directly to stdout (and not to a file).

That said, once again, I would understand that this feature is incompatible with the tool is designed.
And so, I would understand that you don't implement the feature for that reason.

@fabiopipitone
Copy link
Owner

Ok now I get it better. Unfortunately though, that way the user could not exploit the multiprocessing since in that case the output would definitely be totally messed up and with possible duplicates.

Therefore, running in single process and writing to stdout every single log, I'm afraid it wouldn't be much faster than a classic logstash pipeline and it wouldn't make any sense to use the tool since it wouldn't allow massive extractions in a reasonable amount of time. I'm afraid developing such a thing to end up with a surrogate of a simple Logstash wouldn't worth the effort.

Anyway, I won't close the issue and maybe test something when all other issues will be closed.

@fbaligand
Copy link
Author

Well, I truly consider that Logstash and your tool are very different and don’t answer the same need.
The main interest of your tool is to generate one csv report with a command line.

Logstash is an ETL and a server.
It is thought to run as a daemon and process input data as a stream.
So for instance, you don’t know when logstash ends to generate a csv file.

Having a tool that generates one csv report and then chain other commands is very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants