Consider using gzip compressed json input #68

dimatr · 2020-04-10T10:18:52Z

For some big cases the component_segmentation can produce a json files output folder of huge sizes 10-100-... GB. The idea is to gzip compress each file and serve the Javascript with .json.gz files. They can be unpacked on the client side either automagically by a browser itself or explicitly by a script e.g. https://www.npmjs.com/package/decompress-response

6br · 2020-04-11T08:35:00Z

That might be possible. Or, if we take care more on performance or compression rate, we can use MessagePack.

josiahseaman · 2020-04-20T22:35:04Z

I've been told you see another scalability challenge every time you increase your input by 2-4x. Chunking JSON files is intended to keep each individual JSON down to <4MB. However, in very large genomes, the number of files will be large and thus the index of those files bin2file.json will also be massive. Chunking the index or having an index of indices seems slightly wrong.

For internet traffic, compressing the JSON is a great idea. The internals are very repetitive and I've seen 10x compression. However, a more direct route might be simply making the files sparse graph-genome/component_segmentation#29

dimatr · 2020-04-30T09:11:56Z

One can do both: sparse info storage and gzip compression of .json

For the very big setup I'd suggest moving bin2file.json content to the backend and store the index info in an sqlite DB. One can have indices for several "pyramid" zoom levels

josiahseaman added the question Further information is requested label Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using gzip compressed json input #68

Consider using gzip compressed json input #68

dimatr commented Apr 10, 2020

6br commented Apr 11, 2020

josiahseaman commented Apr 20, 2020

dimatr commented Apr 30, 2020

Consider using gzip compressed json input #68

Consider using gzip compressed json input #68

Comments

dimatr commented Apr 10, 2020

6br commented Apr 11, 2020

josiahseaman commented Apr 20, 2020

dimatr commented Apr 30, 2020