tensorflow.contrib.data in observations #22

Arvinds-ds · 2017-09-23T19:15:24Z

I am heavily using tf.contrib.data datasets api for image based tasks. With observations for images (LSUN/celebA) etc being no more than an downloader for these datasets, would it be worthwhile to return a tensorflow dataset something along the lines of

lsun_bedroom_x_train  = lsun('~/data',category='bedroom', set='training', 
                                                   batch_size=32, shuffle=True)
training_data = lsun_bedroom_x_train.make_one_shot_iterator()
.....
for i in range(inference.n_iter):
   x_batch = training_data.get_next()
   inference.update(....{x_ph: x_batch)

The text was updated successfully, but these errors were encountered:

dustinvtran · 2017-09-26T16:31:44Z

Observations is agnostic to the user's choice of workflow, which is a deliberate design choice.

That said, it could be useful to see just how much we can push generic data loading functions that adopt a specific framework. For example, the generator functions in the README assume your workflow can store data in memory and feed numpy arrays during training. Maybe we can do the same with a generic tf.contrib.data datasets function, which can help load in some of these large data sets while still leaving some of the data-specific preprocessing to the user?

Arvinds-ds · 2017-09-26T16:51:49Z

Thanks.I understand the need for being agnostic. But is the expectation to be be independent of tensorflow (and consequently Edward)?. If so, we should not introduce tf specific code.

If a tf dependency is fine then:-
Regarding users choice of workflow, the beauty of the dataset api is exactly that. The dataset object can be further customized, transformed, shuffled later by the user. Edward can handle the particular dataset specific task of reading images from a corresponding text file, loading images and labels as batches etc

dustinvtran · 2017-09-26T16:59:26Z

Agnostic to choice of workflow as in including the framework too. I think of Observations as having a longer life span than Edward or TensorFlow in that it's more likely to still be developed 5-10 years from now; it's more uncertain for computational graph and PP frameworks.

But maybe it's not possible to implement a generic utility for tf.contrib.data to help loading across all large data sets (it is likely to be not possible—I haven't thought carefully about it). If so, I can see an argument for having a tf dependency for large data sets. In the same way we use stuff like networkx for loading network data, we can rely on tf to load large data sets and, in the future, possibly change it.

Arvinds-ds · 2017-09-27T06:05:16Z

@dustinvtran Let me know if crystallize your thoughts on tf dependency for large image datasets, I can contribute code for that. I have written loading code for CelebA, Camvid and they seem to be the basic variation of similar functionality and can be abstracted to return a contrib.data.dataset. I am closing this issue .

dustinvtran · 2017-09-27T06:20:21Z

I thought about it and agree with you. I think it makes sense to have celeba/lsun/etc. functions load and return objects for contrib.data.dataset.

Arvinds-ds · 2017-10-04T22:33:36Z

There are common patterns for loading data e.g load_image from a text file containing image names, load_image from a folder, load labels from text file etc which I am abstracting to a tf_dataset_utils.py. Where do you suggest the file should reside?.

dustinvtran · 2017-10-05T01:14:46Z

Either as part of util.py or a new util_tf.py if you think it's substantial and dependency-ridden enough to have separately.

dustinvtran · 2017-11-02T23:54:48Z

A currently private but soon-to-be-open probabilistic programming library built on PyTorch also uses this library. We should make sure to enable other data loaders and not just TF's.

Arvinds-ds · 2017-11-02T23:57:18Z

Cool. I do love the pytorch Dataset and DataLoader Interface too.

Arvinds-ds closed this as completed Sep 27, 2017

dustinvtran reopened this Sep 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorflow.contrib.data in observations #22

tensorflow.contrib.data in observations #22

Arvinds-ds commented Sep 23, 2017

dustinvtran commented Sep 26, 2017 •

edited

Loading

Arvinds-ds commented Sep 26, 2017

dustinvtran commented Sep 26, 2017 •

edited

Loading

Arvinds-ds commented Sep 27, 2017

dustinvtran commented Sep 27, 2017

Arvinds-ds commented Oct 4, 2017

dustinvtran commented Oct 5, 2017

dustinvtran commented Nov 2, 2017

Arvinds-ds commented Nov 2, 2017

tensorflow.contrib.data in observations #22

tensorflow.contrib.data in observations #22

Comments

Arvinds-ds commented Sep 23, 2017

dustinvtran commented Sep 26, 2017 • edited Loading

Arvinds-ds commented Sep 26, 2017

dustinvtran commented Sep 26, 2017 • edited Loading

Arvinds-ds commented Sep 27, 2017

dustinvtran commented Sep 27, 2017

Arvinds-ds commented Oct 4, 2017

dustinvtran commented Oct 5, 2017

dustinvtran commented Nov 2, 2017

Arvinds-ds commented Nov 2, 2017

dustinvtran commented Sep 26, 2017 •

edited

Loading

dustinvtran commented Sep 26, 2017 •

edited

Loading