CUDA Runtime Error during Spacy Transformers NER Model Training #13129
Unanswered
iamhimanshu0
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
Please see the links under "I'm getting Out of Memory errors" here: #8226 What does your data look like? What exactly have you tried (with the exact details from your config)? It would probably also be helpful to try to use a newer version of python so that you can use newer versions of pytorch, which may have performance improvements. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I am currently training a custom NER model (having 90k data records), using Spacy Transformers (en_core_web_trf) and I'm encountering an issue where the training process is taking an unusually long time and eventually gets killed, throwing a CUDA runtime error.
Here's a brief overview of my setup:
The error message I'm receiving is:
"RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 14.76 GiB total capacity; 11.19 GiB already allocated; 78.75 MiB free; 12.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
I've tried a few troubleshooting steps such as checking GPU memory usage, killing processes that might be engaging GPU memory, and adjusting batch sizes, but the issue persists.
I would appreciate any insights or suggestions on how to resolve this issue. Has anyone else encountered this problem and found a solution?
Thank you in advance for your help!
Beta Was this translation helpful? Give feedback.
All reactions