Trouble with labels.json Format for Dependency Parser Training in spaCy v3 #13065
-
I'm trying to train a dependency parser using spaCy v3 and I've been facing issues with the setup of my labels.json file.
However, when referencing this in my config.cfg and starting training, I encounter an error related to the initialization of these labels. Using the spacy debug config config.cfg command, everything appears to be okay, which suggests that the issue might lie in the way I've formatted my labels. Questions: Could anyone clarify the exact expected format of the labels.json for dependency parser training in spaCy v3? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 17 replies
-
The labels files aren't intended to be generated by hand, and each pipeline component has its own internal format. You can generate the labels files by using But be aware that in most cases, the labels file isn't required. |
Beta Was this translation helpful? Give feedback.
-
Thanks, I have tried this too. While I succeed in training only the tagger, I am unable to train the parser (with the same dataset, btw). After the initialization is done it takes ages for it to start the actual training process (actually it never started). My dataset is around 5000 sentences and tagger finishes training in 7-8 min usually. Where can be the issue? `=========================== Initializing pipeline =========================== [2023-10-16 14:43:10,290] [INFO] Pipeline: ['tok2vec', 'parser'] [2023-10-16 14:43:10,297] [INFO] Created vocabulary [2023-10-16 14:43:10,301] [INFO] Finished initializing nlp object |
Beta Was this translation helpful? Give feedback.
The labels files aren't intended to be generated by hand, and each pipeline component has its own internal format. You can generate the labels files by using
spacy init labels
. If you have a very large or streamed/infinite corpus, you can generate the labels based on a representative subset of your training data instead of the whole training corpus.But be aware that in most cases, the labels file isn't required.
spacy train
can determine the labels from the provided training corpus in the initialization step. If you have a large training corpus and you're repeatedly training on the same data, the main benefit is that you save some time at the beginning ofspacy train
by providing the lab…