Guided decoding with "enhanced" keyterms #652

johnml1135 · 2025-02-11T16:39:24Z

By using some combination of alignments, LLM's and Levenshtein distance, we should be able to determine the "proper form" of keywords in the target sentence, and tell NLLB-200 to include that specific surface form of the word. This could be done by doing the following:

Determining the matching of proper names for each verse from source to target
Giving this information to an LLM to say "when this word is used here in this context, the surface form looks like this" - give it a lot of examples. Put them in the context window
Ask the LLM: "For this new name in this context, what should the surface form be?"
Take those surface forms and feed them to the LLM to "guide" the decoding.

Implementing this relies upon the successful implementation of #178.

benjaminking · 2025-02-11T18:46:28Z

This paper has a potentially relevant strategy for picking the correct form of a word to insert:

https://arxiv.org/abs/2107.00334

johnml1135 mentioned this issue Feb 11, 2025

Best usage of Keyterms (Proper Names) #653

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guided decoding with "enhanced" keyterms #652

Guided decoding with "enhanced" keyterms #652

johnml1135 commented Feb 11, 2025

benjaminking commented Feb 11, 2025 •

edited by johnml1135

Loading

Guided decoding with "enhanced" keyterms #652

Guided decoding with "enhanced" keyterms #652

Comments

johnml1135 commented Feb 11, 2025

benjaminking commented Feb 11, 2025 • edited by johnml1135 Loading

benjaminking commented Feb 11, 2025 •

edited by johnml1135

Loading