You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature
Expansion of SLU to new languages requires much work on manual annotation of data. In order to significantly reduce amount of work, LLMs can be used to machine translate slot-annotated data, e.g. "play me <a> Dune <a> on <b> Youtube <b>" => "Spiele mir <a> Dune <a> auf <b> Youtube <b>"
Such feature is especially useful for expansion of On-Device SLU to new languages, as high quality multilingual transformers/LLMs cannot be used as core SLU model in this case.
Expected behavior
MT-LLM pipeline expects english sentences annotated in generic <> tags format (for example: "play me <a> Dune <a> on <b> Youtube <b>") and outputs translated sentence in the same format ("Spiele mir <a> Dune <a> auf <b> Youtube <b>"). Such data format can be easily converted to BIO annotation and to other popular NLU formats.
In our recent work, we fine-tuned MT-LLM called BigTranslate towards MT of slot-annotated NLU data. We used parallel Amazon MASSIVE dataset for fine-tuning. There is significant performance improvement after fine-tuning (compared to zero-shot LLM-based machine translation) on multiATIS++ benchmark.
In summary, we are wondering how we can merge our work into this project ) And what parts of our work might be useful for this proejct (e.g., scripts for conversion from BIO to tags format ??).
The text was updated successfully, but these errors were encountered:
Describe the feature
Expansion of SLU to new languages requires much work on manual annotation of data. In order to significantly reduce amount of work, LLMs can be used to machine translate slot-annotated data, e.g.
"play me <a> Dune <a> on <b> Youtube <b>" => "Spiele mir <a> Dune <a> auf <b> Youtube <b>"
Such feature is especially useful for expansion of On-Device SLU to new languages, as high quality multilingual transformers/LLMs cannot be used as core SLU model in this case.
Expected behavior
MT-LLM pipeline expects english sentences annotated in generic <> tags format (for example: "
play me <a> Dune <a> on <b> Youtube <b>
") and outputs translated sentence in the same format ("Spiele mir <a> Dune <a> auf <b> Youtube <b>
"). Such data format can be easily converted to BIO annotation and to other popular NLU formats.Additional context
https://paperswithcode.com/paper/large-language-models-for-expansion-of-spoken
In our recent work, we fine-tuned MT-LLM called BigTranslate towards MT of slot-annotated NLU data. We used parallel Amazon MASSIVE dataset for fine-tuning. There is significant performance improvement after fine-tuning (compared to zero-shot LLM-based machine translation) on multiATIS++ benchmark.
Here you can find fine-tuned BigTranslate: https://huggingface.co/Samsung/BigTranslateSlotTranslator
Here you can find code for fine-tuning + code for NLU training: https://github.com/samsung/mt-llm-nlu
In summary, we are wondering how we can merge our work into this project ) And what parts of our work might be useful for this proejct (e.g., scripts for conversion from BIO to tags format ??).
The text was updated successfully, but these errors were encountered: