training stage error Keras validatation_steps=None #62

lukemckinstry · 2019-06-06T14:38:10Z

I encountered this error trying to run the rio spacenet example in Google Colab

Ensuring input files exist [####################################] 100%
Checking for existing output [####################################] 100%
Saving command configuration to data/examples/spacenet/rio/remote-output/train/spacenet-rio-chip-classification-test/command-config-0.json...
Saving command configuration to data/examples/spacenet/rio/remote-output/bundle/spacenet-rio-chip-classification-test/command-config-0.json...
Saving command configuration to data/examples/spacenet/rio/remote-output/predict/spacenet-rio-chip-classification-test/command-config-0.json...
Saving command configuration to data/examples/spacenet/rio/remote-output/eval/spacenet-rio-chip-classification-test/command-config-0.json...
python -m rastervision run_command data/examples/spacenet/rio/remote-output/train/spacenet-rio-chip-classification-test/command-config-0.json
Training model...
/usr/local/lib/python3.6/dist-packages/pluginbase.py:439: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
fromlist, level)
Using TensorFlow backend.
2019-06-06 14:27:00:rastervision.utils.files: INFO - Downloading https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 to /tmp/tmp7n3lkztl/tmpd83op6nh/http/github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-06-06 14:27:03.794498: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-06-06 14:27:03.794813: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x14cf4a0 executing computations on platform Host. Devices:
2019-06-06 14:27:03.794849: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-06-06 14:27:04.065789: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-06 14:27:04.066325: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x14cf080 executing computations on platform CUDA. Devices:
2019-06-06 14:27:04.066371: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-06-06 14:27:04.066753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-06-06 14:27:04.066777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-06 14:27:05.517909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-06 14:27:05.517977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-06 14:27:05.517990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-06 14:27:05.518287: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-06-06 14:27:05.518385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14115 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Found 0 images belonging to 2 classes.
Found 0 images belonging to 2 classes.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
TensorBoard 1.13.1 at http://fea6b3f9897c:6006 (Press CTRL+C to quit)
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/rastervision/main.py", line 17, in
rv.main()
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/rastervision/cli/main.py", line 292, in run_command
rv.runner.CommandRunner.run(command_config_uri)
File "/usr/local/lib/python3.6/dist-packages/rastervision/runner/command_runner.py", line 11, in run
CommandRunner.run_from_proto(msg)
File "/usr/local/lib/python3.6/dist-packages/rastervision/runner/command_runner.py", line 17, in run_from_proto
command.run()
File "/usr/local/lib/python3.6/dist-packages/rastervision/command/train_command.py", line 21, in run
task.train(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision/task/task.py", line 138, in train
self.backend.train(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/backend.py", line 263, in train
_train(backend_config_path, pretrained_model_path, do_monitoring)
File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/commands/train.py", line 15, in _train
trainer.train(do_monitoring)
File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/core/trainer.py", line 150, in train
callbacks=callbacks)
File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 68, in fit_generator
raise ValueError('validation_steps=None is only valid for a'
ValueError: validation_steps=None is only valid for a generator based on the keras.utils.Sequence class. Please specify validation_steps or use the keras.utils.Sequence class.
TensorBoard caught SIGTERM; exiting...
/tmp/tmpkna9hjv6/tmpfdeg5814/Makefile:6: recipe for target '2' failed
make: *** [2] Error 1

The text was updated successfully, but these errors were encountered:

lewfish · 2019-06-11T15:56:40Z

Which command did you run and how did you install and run it in Colab? I suspect that you're not using the Docker image, which means you're probably using an incompatible version of TF and/or Keras.

lukemckinstry · 2019-06-18T18:28:20Z

Installed with pip install rastervision==0.9.0rc1

ran with:
rastervision run local -e rvexamples.examples.spacenet.rio.chip_classification -a raw_uri {RAW_URI} -a processed_uri {PROCESSED_URI} -a root_uri {ROOT_URI} -a test True --splits 2

tf and keras versions:
Keras==2.2.4 tensorflow==1.14.0rc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training stage error Keras validatation_steps=None #62

training stage error Keras validatation_steps=None #62

lukemckinstry commented Jun 6, 2019

lewfish commented Jun 11, 2019

lukemckinstry commented Jun 18, 2019

training stage error Keras validatation_steps=None #62

training stage error Keras validatation_steps=None #62

Comments

lukemckinstry commented Jun 6, 2019

lewfish commented Jun 11, 2019

lukemckinstry commented Jun 18, 2019