You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Error when deserializing tfrecord's in TF 2.x: Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices
#178
The reading and the writing do not match. After persisting into AWS S3, I see that the serde is somehow mismatched, perhaps it's doing a TF 1.x compatible stuff and not TF 2.x ?
To reproduce:
execute in Spark, with --jars s3://<your jars location>/spark-tensorflow-connector_2.12-1.15.0.jar
use a Bootstrap action in EMR to get boto3 installed on the cluster; this worked for me:
#!/bin/bash
pip3 install --user boto3
the python tester program is attached
I used the small movielens dataset (see the attached movies.csv and ratings.csv)
Traceback (most recent call last):
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 300, in
main(sys.argv)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 50, in main
movies, test, train, unique_movie_titles, unique_user_ids = prepare_data(movies, ratings)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 155, in prepare_data
ratings = ratings.map(lambda x: {"movie_title": x["movie_title"], "user_id": x["user_id"]})
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1695, in map
return MapDataset(self, map_func, preserve_cardinality=True)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4045, in init
use_legacy_function=use_legacy_function)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3371, in init
self._function = wrapper_fn.get_concrete_function()
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2939, in get_concrete_function
*args, **kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2906, in _get_concrete_function_garbage_collected
graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3364, in wrapper_fn
ret = _wrapper_helper(*args)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3299, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302, in wrapper
return func(*args, **kwargs)
File "/mnt/tmp/spark-8758df58-16d1-4ec9-a669-5cdf60285850/recsys_tfrs_proto.py", line 155, in
ratings = ratings.map(lambda x: {"movie_title": x["movie_title"], "user_id": x["user_id"]})
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 986, in _slice_helper
_check_index(s)
File "/usr/local/lib64/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 865, in _check_index
raise TypeError(_SLICE_TYPE_ERROR + ", got {!r}".format(idx))
TypeError: Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices, got 'movie_title'
The lib doesn't seem to be working in the context of TensorFlow 2.x.
My environment:
The reading and the writing do not match. After persisting into AWS S3, I see that the serde is somehow mismatched, perhaps it's doing a TF 1.x compatible stuff and not TF 2.x ?
To reproduce:
--jars s3://<your jars location>/spark-tensorflow-connector_2.12-1.15.0.jar
Output
The error when running in the cluster:
spark-tf-connector-serde-issue.zip
The text was updated successfully, but these errors were encountered: