Pretrain decoder with large amount of text data. #1753

juliendespres · 2024-09-10T14:08:20Z

juliendespres
Sep 10, 2024

Hello, we've been working with K2 for almost a year now and we've developed systems for around fifteen languages thanks to it.
The performance is consistently better than our old TDNN-f trained with Kaldi systems.
The only problem is that K2's RNN-T models are very bad at spelling proper nouns in particular, and this is not surprising since these models only use transcriptions to train the decoder.
Having seen that LODR rescoring is not very effective, my question is, have you ever tried to pretrain the decoder with large amounts of text from newspapers or other sources, and then freezing or finetuning the decoder while training the models?

csukuangfj · 2024-09-11T05:32:54Z

csukuangfj
Sep 11, 2024
Maintainer

my question is, have you ever tried to pretrain the decoder with large amounts of text from newspapers or other sources, and then freezing or finetuning the decoder while training the models?

We have not tried that.

However, people from Google have tried it.

Please see the following paper
https://storage.googleapis.com/gweb-research2023-media/pubtools/5775.pdf

0 replies

danpovey · 2024-09-11T05:58:00Z

danpovey
Sep 11, 2024
Maintainer

I think finding a way to pretrain or jointly train the encoder would be more useful.

0 replies

juliendespres · 2024-09-11T09:44:01Z

juliendespres
Sep 11, 2024
Author

Thank you for these answers.
I think I may have overestimated the role of the decoder in RNN-T systems.
However, the IBM paper https://arxiv.org/pdf/2202.13155 suggested that it is possible to improve the performance of the decoder and make its behavior more like that of the language model of a conventional ASR system.
Do you think it would make sense to try to reproduce IBM's results with K2's models?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrain decoder with large amount of text data. #1753

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Pretrain decoder with large amount of text data. #1753

juliendespres Sep 10, 2024

Replies: 3 comments

csukuangfj Sep 11, 2024 Maintainer

danpovey Sep 11, 2024 Maintainer

juliendespres Sep 11, 2024 Author

juliendespres
Sep 10, 2024

csukuangfj
Sep 11, 2024
Maintainer

danpovey
Sep 11, 2024
Maintainer

juliendespres
Sep 11, 2024
Author