Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decide on standard for saving/loading data sets that fit in memory #47

Open
dustinvtran opened this issue Nov 3, 2017 · 1 comment
Open

Comments

@dustinvtran
Copy link
Member

dustinvtran commented Nov 3, 2017

For data sets like multi-MNIST and small ImageNet, we preprocess the data and cache by writing to disk so that future calls can load it into memory. More generally, we need to save and load data when its function requires preprocessing and the data fits in memory to be loaded.

We should decide on a specific option such as pickle, np.savez, or hdf5.

@Arvinds-ds
Copy link
Contributor

How about tfrecords for both in-memory/out of memory datatsets? I feel caching of preprocessed data becomes more important for larger datasets. You can look at segnet project in my repo on integrating Dataset API and Edward. Eventually, it becomes faster to save the processed images in tfrecords for multiple itreations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants