Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed-language output when translating from base NLLB #656

Open
bhartmoore opened this issue Feb 13, 2025 · 4 comments
Open

Mixed-language output when translating from base NLLB #656

bhartmoore opened this issue Feb 13, 2025 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@bhartmoore
Copy link
Collaborator

In the past, we've successfully translated from the NLLB base model. I believe the only required arguments for translate.py were

  • src-iso
  • trg-iso
  • src-project
  • clearml-queue and
  • experiment

There also needed to be a minimal config file in the experiment folder to identify the model to use. When I use this set-up now, I am consistently seeing mixed-language output. It seems like the language codes may not be passed through correctly. Has something changed, or is it possible I've forgotten a crucial step?

A couple of recent examples (see screenshot of mixed-language output below):
"S:\MT\experiments\Indonesia\Behoa\NLLB.3.3B.en-TBBe15.id-TBBe_2025_FEB" with command -m silnlp.nmt.translate --checkpoint base --src-project TBBe15_2024_11_06 --books 1CO 2CO REV --trg-iso ind_Latn --src-iso eng_Latn --clearml-queue production Indonesia\Behoa\NLLB.3.3B.en-TBBe15.id-TBBe_2025_FEB and ClearML process (I have tried this with and without the "--checkpoint" flag)

Image

"S:\MT\experiments\Demo_Bethany\NLLB_base" with command -m silnlp.nmt.translate --checkpoint base --src-project NIV11R --books 1JN --trg-iso npi_Deva --src-iso eng_Latn --clearml-queue production Demo_Bethany\NLLB_base for ClearML process

An older example that worked correctly:
"S:\MT\experiments\FT-MalayCentral\NLLB.3.3B.en_NIrV-ind_Latn" with command -m silnlp.nmt.translate --checkpoint best --src-project NIrV --books LUK --trg-iso ind_Latn --src-iso eng_Latn --clearml-queue jobs_urgent FT-MalayCentral\NLLB.3.3B.en_NIrV-ind_Latn and ClearML process

@bhartmoore bhartmoore added the bug Something isn't working label Feb 13, 2025
@ddaspit ddaspit moved this from 🆕 New to 🔖 Ready in SIL-NLP Research Feb 19, 2025
@ddaspit ddaspit removed their assignment Feb 20, 2025
@ddaspit
Copy link
Collaborator

ddaspit commented Feb 20, 2025

@bhartmoore Is this blocking experiments? How important is this issue?

@bhartmoore
Copy link
Collaborator Author

@bhartmoore Is this blocking experiments? How important is this issue?

I'd say it's important-not-urgent. No project work is currently blocked. There were a couple teams recently for whom I wanted to generate drafts this way, but they have found workarounds. (E.g., one team wanted to convert an English back translation to Indonesian, but will fine tune and translate vernacular-to-Indonesian instead). We do anticipate needing this in the future for project work.

@ddaspit
Copy link
Collaborator

ddaspit commented Feb 20, 2025

@TaperChipmunk32 Can you look at this when you get a chance? It isn't urgent.

@TaperChipmunk32
Copy link
Collaborator

@TaperChipmunk32 Can you look at this when you get a chance? It isn't urgent.

Yes, I'll look into this when I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🔖 Ready
Development

No branches or pull requests

3 participants