CUDA Runtime Error during Spacy Transformers NER Model Training #13129

iamhimanshu0 · 2023-11-16T11:41:09Z

iamhimanshu0
Nov 16, 2023

Hello everyone,

I am currently training a custom NER model (having 90k data records), using Spacy Transformers (en_core_web_trf) and I'm encountering an issue where the training process is taking an unusually long time and eventually gets killed, throwing a CUDA runtime error.

Here's a brief overview of my setup:

Model: Custom NER model
Library: Spacy Transformers
Hardware: I'm using AWS ec2 sever (g4dn.8xlarge) vCPUs : 32 GB,
Software: Python version 3.6.9, spacy version 3.6.1

The error message I'm receiving is:
"RuntimeError: CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 14.76 GiB total capacity; 11.19 GiB already allocated; 78.75 MiB free; 12.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

I've tried a few troubleshooting steps such as checking GPU memory usage, killing processes that might be engaging GPU memory, and adjusting batch sizes, but the issue persists.

I would appreciate any insights or suggestions on how to resolve this issue. Has anyone else encountered this problem and found a solution?

Thank you in advance for your help!

adrianeboyd · 2023-11-17T08:08:30Z

adrianeboyd
Nov 17, 2023

Please see the links under "I'm getting Out of Memory errors" here: #8226

What does your data look like? What exactly have you tried (with the exact details from your config)?

It would probably also be helpful to try to use a newer version of python so that you can use newer versions of pytorch, which may have performance improvements.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Runtime Error during Spacy Transformers NER Model Training #13129

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

CUDA Runtime Error during Spacy Transformers NER Model Training #13129

iamhimanshu0 Nov 16, 2023

Replies: 1 comment

adrianeboyd Nov 17, 2023

iamhimanshu0
Nov 16, 2023

adrianeboyd
Nov 17, 2023