Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training stage error Keras validatation_steps=None #62

Open
lukemckinstry opened this issue Jun 6, 2019 · 2 comments
Open

training stage error Keras validatation_steps=None #62

lukemckinstry opened this issue Jun 6, 2019 · 2 comments

Comments

@lukemckinstry
Copy link

I encountered this error trying to run the rio spacenet example in Google Colab

Ensuring input files exist [####################################] 100%
Checking for existing output [####################################] 100%
Saving command configuration to data/examples/spacenet/rio/remote-output/train/spacenet-rio-chip-classification-test/command-config-0.json...
Saving command configuration to data/examples/spacenet/rio/remote-output/bundle/spacenet-rio-chip-classification-test/command-config-0.json...
Saving command configuration to data/examples/spacenet/rio/remote-output/predict/spacenet-rio-chip-classification-test/command-config-0.json...
Saving command configuration to data/examples/spacenet/rio/remote-output/eval/spacenet-rio-chip-classification-test/command-config-0.json...
python -m rastervision run_command data/examples/spacenet/rio/remote-output/train/spacenet-rio-chip-classification-test/command-config-0.json
Training model...
/usr/local/lib/python3.6/dist-packages/pluginbase.py:439: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
fromlist, level)
Using TensorFlow backend.
2019-06-06 14:27:00:rastervision.utils.files: INFO - Downloading https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 to /tmp/tmp7n3lkztl/tmpd83op6nh/http/github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-06-06 14:27:03.794498: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-06-06 14:27:03.794813: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x14cf4a0 executing computations on platform Host. Devices:
2019-06-06 14:27:03.794849: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-06-06 14:27:04.065789: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-06 14:27:04.066325: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x14cf080 executing computations on platform CUDA. Devices:
2019-06-06 14:27:04.066371: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-06-06 14:27:04.066753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
totalMemory: 14.73GiB freeMemory: 14.60GiB
2019-06-06 14:27:04.066777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-06 14:27:05.517909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-06 14:27:05.517977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-06 14:27:05.517990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-06 14:27:05.518287: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-06-06 14:27:05.518385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14115 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Found 0 images belonging to 2 classes.
Found 0 images belonging to 2 classes.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
TensorBoard 1.13.1 at http://fea6b3f9897c:6006 (Press CTRL+C to quit)
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/rastervision/main.py", line 17, in
rv.main()
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/rastervision/cli/main.py", line 292, in run_command
rv.runner.CommandRunner.run(command_config_uri)
File "/usr/local/lib/python3.6/dist-packages/rastervision/runner/command_runner.py", line 11, in run
CommandRunner.run_from_proto(msg)
File "/usr/local/lib/python3.6/dist-packages/rastervision/runner/command_runner.py", line 17, in run_from_proto
command.run()
File "/usr/local/lib/python3.6/dist-packages/rastervision/command/train_command.py", line 21, in run
task.train(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision/task/task.py", line 138, in train
self.backend.train(tmp_dir)
File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/backend.py", line 263, in train
_train(backend_config_path, pretrained_model_path, do_monitoring)
File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/commands/train.py", line 15, in _train
trainer.train(do_monitoring)
File "/usr/local/lib/python3.6/dist-packages/rastervision/backend/keras_classification/core/trainer.py", line 150, in train
callbacks=callbacks)
File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 68, in fit_generator
raise ValueError('validation_steps=None is only valid for a'
ValueError: validation_steps=None is only valid for a generator based on the keras.utils.Sequence class. Please specify validation_steps or use the keras.utils.Sequence class.
TensorBoard caught SIGTERM; exiting...
/tmp/tmpkna9hjv6/tmpfdeg5814/Makefile:6: recipe for target '2' failed
make: *** [2] Error 1

@lewfish
Copy link
Contributor

lewfish commented Jun 11, 2019

Which command did you run and how did you install and run it in Colab? I suspect that you're not using the Docker image, which means you're probably using an incompatible version of TF and/or Keras.

@lukemckinstry
Copy link
Author

Installed with pip install rastervision==0.9.0rc1

ran with:
rastervision run local -e rvexamples.examples.spacenet.rio.chip_classification -a raw_uri {RAW_URI} -a processed_uri {PROCESSED_URI} -a root_uri {ROOT_URI} -a test True --splits 2

tf and keras versions:
Keras==2.2.4 tensorflow==1.14.0rc1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants