You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great project!
I have a bit of a weird question, I have a perfectly trained model but now need a model that works only for a few phrases and words and should forget everything else. The original model is trained on a lot of data and will handle many voices, accents and background noises.
The contextual biasing or the use of a language model would achieve that, but i want to dumb down the actual model so that it cannot be used for other sentences (because of commercial reasons).
If i just finetune on those few phrases, I am worried that because of the very limited data in the target trainset, the model will forget the variety of voices and accents.
What If i take the original dataset and model (trained with bpe) and I substitute all unwanted words with , would that work ? Would it work better if I retrain the original model with word based tokensfile, and then finetune on the same data with all unwanted words mapped to ?
Does anybody have any better or more creative ideas?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
Thanks for the great project!
I have a bit of a weird question, I have a perfectly trained model but now need a model that works only for a few phrases and words and should forget everything else. The original model is trained on a lot of data and will handle many voices, accents and background noises.
The contextual biasing or the use of a language model would achieve that, but i want to dumb down the actual model so that it cannot be used for other sentences (because of commercial reasons).
If i just finetune on those few phrases, I am worried that because of the very limited data in the target trainset, the model will forget the variety of voices and accents.
What If i take the original dataset and model (trained with bpe) and I substitute all unwanted words with , would that work ? Would it work better if I retrain the original model with word based tokensfile, and then finetune on the same data with all unwanted words mapped to ?
Does anybody have any better or more creative ideas?
Beta Was this translation helpful? Give feedback.
All reactions