Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guided decoding with "enhanced" keyterms #652

Open
johnml1135 opened this issue Feb 11, 2025 · 1 comment
Open

Guided decoding with "enhanced" keyterms #652

johnml1135 opened this issue Feb 11, 2025 · 1 comment

Comments

@johnml1135
Copy link
Collaborator

By using some combination of alignments, LLM's and Levenshtein distance, we should be able to determine the "proper form" of keywords in the target sentence, and tell NLLB-200 to include that specific surface form of the word. This could be done by doing the following:

  1. Determining the matching of proper names for each verse from source to target
  2. Giving this information to an LLM to say "when this word is used here in this context, the surface form looks like this" - give it a lot of examples. Put them in the context window
  3. Ask the LLM: "For this new name in this context, what should the surface form be?"
  4. Take those surface forms and feed them to the LLM to "guide" the decoding.

Implementing this relies upon the successful implementation of #178.

@benjaminking
Copy link
Collaborator

benjaminking commented Feb 11, 2025

This paper has a potentially relevant strategy for picking the correct form of a word to insert:

https://arxiv.org/abs/2107.00334

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants