-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VS-1368 The tarball is too damn big #8829
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC what is the squish factor?
Co-authored-by: Miguel Covarrubias <[email protected]>
It reduces the size of just the header lines in the interval list from 581689 bytes to 3976. So 0.0068 smaller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel that a whitelist vs blacklist approach would be cleaner, as it would allow us to avoid the grep regex filtering. It would let us keep what we want by doing it in a way that is genomically meaningful--specifying the intervals that we want to keep and tossing everything that isn't in them by intersecting our whitelist with the intervals we are passed--instead of relying on string manipulation. But I'm willing to give this a thumb as long as we ensure that we go back and do it in a cleaner way later. We want to get something workable in place sooner rather than later, after all.
Also, I'd probably be compelled to give it a thumb anyway just for the name of the PR.
* Compress the tarball saves a bit. * Remove unused contigs from interval_list files by grepping. --------- Co-authored-by: Miguel Covarrubias <[email protected]>
* Compress the tarball saves a bit. * Remove unused contigs from interval_list files by grepping. --------- Co-authored-by: Miguel Covarrubias <[email protected]>
* Compress the tarball saves a bit. * Remove unused contigs from interval_list files by grepping. --------- Co-authored-by: Miguel Covarrubias <[email protected]>
This PR does 2 things to address the size of the tarball:
Example run here.