Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using gzip compressed json input #68

Open
dimatr opened this issue Apr 10, 2020 · 3 comments
Open

Consider using gzip compressed json input #68

dimatr opened this issue Apr 10, 2020 · 3 comments
Labels
question Further information is requested

Comments

@dimatr
Copy link
Collaborator

dimatr commented Apr 10, 2020

For some big cases the component_segmentation can produce a json files output folder of huge sizes 10-100-... GB. The idea is to gzip compress each file and serve the Javascript with .json.gz files. They can be unpacked on the client side either automagically by a browser itself or explicitly by a script e.g. https://www.npmjs.com/package/decompress-response

@6br
Copy link
Contributor

6br commented Apr 11, 2020

That might be possible. Or, if we take care more on performance or compression rate, we can use MessagePack.

@josiahseaman josiahseaman added the question Further information is requested label Apr 20, 2020
@josiahseaman
Copy link
Member

I've been told you see another scalability challenge every time you increase your input by 2-4x. Chunking JSON files is intended to keep each individual JSON down to <4MB. However, in very large genomes, the number of files will be large and thus the index of those files bin2file.json will also be massive. Chunking the index or having an index of indices seems slightly wrong.

For internet traffic, compressing the JSON is a great idea. The internals are very repetitive and I've seen 10x compression. However, a more direct route might be simply making the files sparse graph-genome/component_segmentation#29

@dimatr
Copy link
Collaborator Author

dimatr commented Apr 30, 2020

One can do both: sparse info storage and gzip compression of .json

For the very big setup I'd suggest moving bin2file.json content to the backend and store the index info in an sqlite DB. One can have indices for several "pyramid" zoom levels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants