You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training using any of the example configurations from the documentation I get the error:
"RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED"
Reproducing
For example running: python main.py --network_type rnn --dataset wikitext
My system configuration
CUDA 10.1
Python 3.7.3
PyTorch 1.1.0
Arch Linux
GPU: RTX 2070
Other PyTorch applications work just fine.
Full output (from pipenv environment):
% python main.py --network_type rnn --dataset wikitext oliver@oliver
2019-06-14 16:30:31,585:INFO::[*] Make directories : logs/wikitext_2019-06-14_16-30-31
2019-06-14 16:30:49,909:INFO::regularizing:
2019-06-14 16:30:54,743:INFO::# of parameters: 169,315,278
2019-06-14 16:30:54,834:INFO::[*] MODEL dir: logs/wikitext_2019-06-14_16-30-31
2019-06-14 16:30:54,834:INFO::[*] PARAM path: logs/wikitext_2019-06-14_16-30-31/params.json
Traceback (most recent call last):
File "main.py", line 54, in <module>
main(args)
File "main.py", line 34, in main
trnr.train()
File "/home/oliver/code/ENAS-pytorch/trainer.py", line 222, in train
self.train_shared(dag=dag)
File "/home/oliver/code/ENAS-pytorch/trainer.py", line 305, in train_shared
dags)
File "/home/oliver/code/ENAS-pytorch/trainer.py", line 251, in get_loss
output, hidden, extra_out = self.shared(inputs, dag, hidden=hidden)
File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/oliver/code/ENAS-pytorch/models/shared_rnn.py", line 235, in forward
logit, hidden = self.cell(x_t, hidden, dag)
File "/home/oliver/code/ENAS-pytorch/models/shared_rnn.py", line 354, in cell
output = self.batch_norm(output)
File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 83, in forward
exponential_average_factor, self.eps)
File "/home/oliver/.local/share/virtualenvs/ENAS-pytorch-kjHs_kjH/lib/python3.7/site-packages/torch/nn/functional.py", line 1697, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Debugging
Debugging the parameters passed to batch_norm I found that the following parameters are all on cuda-device: input, weight, bias, running_mean, running_var. Which is all reasonable.
The remaining vars are reasonable as well.
The text was updated successfully, but these errors were encountered:
When training using any of the example configurations from the documentation I get the error:
"RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED"
Reproducing
For example running:
python main.py --network_type rnn --dataset wikitext
My system configuration
CUDA 10.1
Python 3.7.3
PyTorch 1.1.0
Arch Linux
GPU: RTX 2070
Other PyTorch applications work just fine.
Full output (from pipenv environment):
Debugging
Debugging the parameters passed to batch_norm I found that the following parameters are all on cuda-device: input, weight, bias, running_mean, running_var. Which is all reasonable.
The remaining vars are reasonable as well.
The text was updated successfully, but these errors were encountered: