Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GZip.IOBuffer type? #23

Open
quinnj opened this issue Mar 30, 2015 · 10 comments
Open

GZip.IOBuffer type? #23

quinnj opened this issue Mar 30, 2015 · 10 comments

Comments

@quinnj
Copy link
Member

quinnj commented Mar 30, 2015

So I'm not super familiar with how the package is currently structured and am meaning to dive in more, but I wonder if someone could help clarify if this is a good idea or not.

I'm thinking of creating a GZip.IOBuffer type that would be a wrapper around a IOBuffer type, but that you could write to and whatever you write gets gzipped into the buffer. You could then do a takebuf_string to get the raw gzip data and send in an HTTP request, for example.

Is this amenable to how GZip compression works? Gzipping chunks at a time like this? I think the approach should be pretty simple to implement, but if there's anything I should watch out for or avoid, I'd love to hear it.

@kmsquire
Copy link
Contributor

Hi Jacob, you might consider using Zlib.jl instead, as it already has a stream API, and can compress (and I believe uncompress) gzip-formatted files/streams. It might require a little bit of hacking to clean it up and add functionality--e.g., the stream Reader has a parameter to set the decompress buffer size, but the Writer doesn't seem to (or at least it's not documented).

@quinnj
Copy link
Member Author

quinnj commented Mar 30, 2015

Thanks @kmsquire, that does seem pretty close to what I'm looking for. Do you know the reason/history for both packages? Should we try to merge them at some point while improving the APIs?

@garborg
Copy link
Contributor

garborg commented Mar 30, 2015

cc. @dcjones

@kmsquire
Copy link
Contributor

I believe that Daniel originally wrote Zlib.jl as something which could quickly and easily compress or decompress a buffer. It gained additional functionality over time.

I wrote GZip.jl because I needed access to gzipped files, and Zlib.jl didn't exist yet, or I felt it didn't quite fit my needs (I think the original implementation had a really small, fixed sized buffer and was pretty slow), and I thought that simply calling the zlib C functions would be faster/more efficient.

I actually do think that the packages could be combined.

At one point, I had hopes that the magic number of gzip (and other compressed files) would automatically be recognized, and the files would be uncompressed automatically (with a raw mode available for reading the compressed data), but my need for something like that diminished, and I've never gotten back to it. Might be good to make this part of https://github.com/JuliaIO.

Cc: @SimonDanisch

@quinnj
Copy link
Member Author

quinnj commented Mar 30, 2015

I wasn't aware of JuliaIO; when did that start?

It might be good to integrate some of the Zlib.jl stuff into this one, since this is already owned by JuliaLang. Basic compression stuff seems like not a bad canddiate for being closely related to Base or "blessed" by JuliaLang.

@kmsquire
Copy link
Contributor

JuliaIO is only a couple of weeks old. ;-) @SimonDanisch started it. I don't know how well defined the goals are yet, but I think it's generally a good idea to get most IO in the same area. But it's only useful if there are enough interested parties involved. Interested?

Regarding integrating Zlib.jl into this: that would be fine, but I think the naming would probably be backwards, since the Zlib functionality arguably is a superset of GZip's functionality. Maybe @dcjones would be interested in transferring Zlib.jl to JuliaLang?

@dcjones
Copy link

dcjones commented Mar 31, 2015

Maybe @dcjones would be interested in transferring Zlib.jl to JuliaLang?

Sure, and I'm all for merging the two.

@SimonDanisch
Copy link
Member

I created JuliaIO with one simple goal: being able to get the correct Julia Object from an arbitrary path.

read("some/path/image.jpg") #-> Image
read("some/path/archive.rar") #->  compressed stream!?

So it's not really necessary to move every package to JuliaIO. Important is that they all implement the same interface. Than in JuliaIO there could be a meta package Compression.jl, which organizes different lower level compression libraries and basically passes them to FileIO.
But having all IO packages in one group with motivated people that have push access will definitely help achieving this;)

@dcjones
Copy link

dcjones commented Aug 20, 2015

For those interested I wrote another set of zlib bindings that's generally faster than both Zlib.jl and GZip.jl (and @jiahao's code in #32) and mostly encompasses the features of both: Libz.jl.

The buffering used to make it fast is split into another package BufferedStreams.jl, so it can be reused if we want fast bindings to other compression libraries, etc. Actually, it even seems to make writing/reading plain data to/from IOStreams faster in some cases.

I hope we can standardize on some version of this, but let me know if there are critical missing features, or other stuff I'm missing.

@kmsquire
Copy link
Contributor

@dcjones, great! I'll try to take a look this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants