decide on standard for saving/loading data sets that fit in memory #47

dustinvtran · 2017-11-03T04:13:46Z

For data sets like multi-MNIST and small ImageNet, we preprocess the data and cache by writing to disk so that future calls can load it into memory. More generally, we need to save and load data when its function requires preprocessing and the data fits in memory to be loaded.

We should decide on a specific option such as pickle, np.savez, or hdf5.

Arvinds-ds · 2017-11-16T19:25:49Z

How about tfrecords for both in-memory/out of memory datatsets? I feel caching of preprocessed data becomes more important for larger datasets. You can look at segnet project in my repo on integrating Dataset API and Edward. Eventually, it becomes faster to save the processed images in tfrecords for multiple itreations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decide on standard for saving/loading data sets that fit in memory #47

decide on standard for saving/loading data sets that fit in memory #47

dustinvtran commented Nov 3, 2017 •

edited

Loading

Arvinds-ds commented Nov 16, 2017

decide on standard for saving/loading data sets that fit in memory #47

decide on standard for saving/loading data sets that fit in memory #47

Comments

dustinvtran commented Nov 3, 2017 • edited Loading

Arvinds-ds commented Nov 16, 2017

dustinvtran commented Nov 3, 2017 •

edited

Loading