custom text classification using spacy v 3.1.3 #9839
-
Hi, Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 10 comments
-
If no scores are being printed what's probably happening is there's some issue initializing the model. spaCy has no limit on the number of labels, but adding more labels would require more memory, and it's possible you're out of memory. It sounds like the command exits on its own, is that correct? If so it may be killed due to OOM errors. On Linux you should be able to check the Some other things to check:
If this is a memory issue, you should look at using streaming corpora. |
Beta Was this translation helpful? Give feedback.
-
RAM 20GB |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I think the link is our only official documentation on the subject. Was there something in particular that you wanted to know that wasn't there? Also, were you able to check dmesg to see if it was an OOM error or not? If it didn't show up there it'd be something else, and we'd have to do some more debugging. Another thing you can try is using a smaller set of the data and see if that works, which would also suggest a memory issue. |
Beta Was this translation helpful? Give feedback.
-
I can able to train models with small datasets and also with large datasets. the problem is when we have more number labels in the large datasets the size of the .spacy file is high. |
Beta Was this translation helpful? Give feedback.
-
An OOM error is an "out of memory" error. You can check if your process is being killed due to running out of memory by checking the output of dmesg on Linux. If you can't fit all your training data in memory, you need to use a streaming corpus reader. The linked part of the docs explains how to do that. In particular, since you have a lot of labels, keep in mind you'll want to explicitly provide them. (How to do this is explained in the docs.)
Do you mean that setting |
Beta Was this translation helpful? Give feedback.
-
didn't change anything |
Beta Was this translation helpful? Give feedback.
-
Just to be sure, were you able to confirm this is an OOM error using dmesg or other methods? If using a streaming corpus didn't fix the issue, then I guess an individual batch is also causing OOM errors. In that case you'll need to reduce batch size. You should try reducing |
Beta Was this translation helpful? Give feedback.
-
start = 100 |
Beta Was this translation helpful? Give feedback.
-
You should reduce the start and stop values, which actually control the batch size. Since it seems like this is not a bug but a configuration issue, I'm moving it to Discussions. |
Beta Was this translation helpful? Give feedback.
If no scores are being printed what's probably happening is there's some issue initializing the model. spaCy has no limit on the number of labels, but adding more labels would require more memory, and it's possible you're out of memory.
It sounds like the command exits on its own, is that correct? If so it may be killed due to OOM errors. On Linux you should be able to check the
dmesg
log for evidence of that.Some other things to c…