Trainer a NER from scratch and reusing the parser, tagger, and other components from en_core_web_md model #9213
-
I have been turning around for quite some time on this one and I guess I am probably missing something. I want to: Here is what I do:
The NER trains correctly Now I want to use the trained NER in a more complete pipeline with lemmatizer, etc. I defined a config with all elements but of course the tagger, parser, etc. need to reference the components from en_core_web_md while the NER comes from the newly trained model I have tried many things: I have tried many other things. What is the proper way to do this ? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I have made a bit of progress essentially by using the source attribute in the config
When I load the resulting pipeline, the final model is able to detect the entities properly. However if I enable back the parser, a sentence of 12 words is split into 8 sentences. The same sentence remains a single sentence when using directly the en_core_web_md model. The cfg, model and moves files in the parser folder of the trained model seems identical to the ones in en_core_web_md folder. |
Beta Was this translation helpful? Give feedback.
-
I believe I found the solution. They key was to freeze absolutely everything and to set the tok2vec source to en_core_web_md.
|
Beta Was this translation helpful? Give feedback.
I believe I found the solution. They key was to freeze absolutely everything and to set the tok2vec source to en_core_web_md.