[New Feature] LLMs for Machine Translation of slot-annotated data #260

j-hoscilowic · 2024-04-16T09:43:07Z

Describe the feature
Expansion of SLU to new languages requires much work on manual annotation of data. In order to significantly reduce amount of work, LLMs can be used to machine translate slot-annotated data, e.g.
"play me <a> Dune <a> on Youtube " => "Spiele mir <a> Dune <a> auf Youtube "

Such feature is especially useful for expansion of On-Device SLU to new languages, as high quality multilingual transformers/LLMs cannot be used as core SLU model in this case.

Expected behavior
MT-LLM pipeline expects english sentences annotated in generic <> tags format (for example: "play me <a> Dune <a> on Youtube ") and outputs translated sentence in the same format ("Spiele mir <a> Dune <a> auf Youtube "). Such data format can be easily converted to BIO annotation and to other popular NLU formats.

Additional context
https://paperswithcode.com/paper/large-language-models-for-expansion-of-spoken

In our recent work, we fine-tuned MT-LLM called BigTranslate towards MT of slot-annotated NLU data. We used parallel Amazon MASSIVE dataset for fine-tuning. There is significant performance improvement after fine-tuning (compared to zero-shot LLM-based machine translation) on multiATIS++ benchmark.

Here you can find fine-tuned BigTranslate: https://huggingface.co/Samsung/BigTranslateSlotTranslator
Here you can find code for fine-tuning + code for NLU training: https://github.com/samsung/mt-llm-nlu

In summary, we are wondering how we can merge our work into this project ) And what parts of our work might be useful for this proejct (e.g., scripts for conversion from BIO to tags format ??).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Feature] LLMs for Machine Translation of slot-annotated data #260

[New Feature] LLMs for Machine Translation of slot-annotated data #260

j-hoscilowic commented Apr 16, 2024

[New Feature] LLMs for Machine Translation of slot-annotated data #260

[New Feature] LLMs for Machine Translation of slot-annotated data #260

Comments

j-hoscilowic commented Apr 16, 2024