Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on #30
Summary
This suggests using the
.pkl.gz
extension rather than the present.pckl.gzip
extension for dataset files.Motivation
There are several advantages to do so:
.gz
is the standard extension for Gzip files (see https://www.gnu.org/software/gzip/manual/gzip.html and https://en.wikipedia.org/wiki/Gzip)..gz
extension,df.to_pickle
andpd.read_pickle
can automatically decide to use Gzip for compression/expansion without explicitly specifying the compression argument..bz2
.It would be also nicer to use
.pkl
(or.pickle
) rather thanpckl
because.pkl
extension in their examples (seedf.to_pickle
andpd.read_pickle
)..pkl.gz
can be found on web in general than.pckl.gz
. So people can more easily notice that this is in the pickle format.For already made
pckl.gzip
files, we can simply rename them.Other changes
From an example in
docs/pacemaker/quickstart.md
, I removed the protocol argument frompd.read_pickle
(this argument exists fordf.to_pickle
but not forpd.read_pickle
).