Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sortable split output files #62

Open
ggrothendieck opened this issue May 3, 2024 · 2 comments
Open

sortable split output files #62

ggrothendieck opened this issue May 3, 2024 · 2 comments

Comments

@ggrothendieck
Copy link

ggrothendieck commented May 3, 2024

When using gocsvs split the output file names have suffix such as -1.csv, etc. so if there are more than 9 the filenames don't sort properly. Would be nice to be able to optionally zero pad the number in the suffix with sufficient zeroes that they sort properly.

@zacharysyoung
Copy link
Contributor

split streams the output, swapping to a new output CSV as soon as the current CSV fills up to max-rows; split cannot know ahead of time how many files will be created, and so cannot calculate the padding.

That said, it can write the outputs to temp files, keeping track of the file count, then do a final rename with the correct padding. This would have the downside that if you were looking in some file explorer or watching/ls'ing you wouldn't see any incremental progress.

@zacharysyoung
Copy link
Contributor

zacharysyoung commented Jul 16, 2024

I made a little command-line tool that can fix this after running split, by giving it the prefix for the split CSVs: https://github.com/zacharysyoung/gocsv-padsplits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants