Pre-defined named configurations for model packages #12028
Replies: 1 comment
-
Thanks for the suggestion, I don't think that's a need we've had before. Let me describe the situation to make sure I understand this correctly. You have packages you use with different configs, but it doesn't make sense to make different versions of the configs because the resulting packages would take up too much space or something? So from that perspective you want to change the config rather than making multiple packages. To do this you pass a lot of overrides on the command line. The issue is that the scripts where you use the pipelines are not centralized in one fashion, so the configuration of the overrides requires modification in multiple places. One thing confuses me a bit about this - while the modifications would be simpler and smaller, if each pipeline had multiple configs, wouldn't you still have to modify the scripts with the config names? Is there a reason you can't modify your scripts to have uniform access to the pipelines, perhaps through specific pipeline loading scripts? It might help my understanding to have some concrete examples of how you're using this. For example, maybe all your pipelines have a "fast" vs a "full" configuration, where some pipelines might be disabled. Or is it more like each pipeline has idiosyncratic modes? |
Beta Was this translation helpful? Give feedback.
-
The spacy.load function has various options to enable, disable, and config a language model package on load. I have a complex package that is designed to run in a few different ways depending on the input data. Right now, the calling scripts pass in the right combination of arguments to configure the pipeline for the data. But, this makes adjustments and updates in the package really hard because every script that uses it has to be modified.
It would be nice if there was a way to include multiple named configurations with a package, and if there was an option in
spacy.load
to take in the name of a configuration to load, other than the default one. Each configuration would specify a particular set of pipeline components and their configs. That way scripts can just specify a config to load that's already defined in package.Beta Was this translation helpful? Give feedback.
All reactions