Best usage of Keyterms (Proper Names) #653

johnml1135 · 2025-02-11T16:41:06Z

This issue is to research the best usage of Proper Names (and keyterms more broadly) into the AI drafting flow and determine the best way to use it. Currently, all Proper Names (type PN) are included in training with the same weights, no matter how much or little other training data is being used. A few points of research could be:

A better Keyterm metric

We need to be able to easily determine the effectiveness of different means of adding keyterms and how well they are used. These metrics should account for:

That the proper keyterms are used in the proper places
That the Bleu score has not been degraded
If possible, accounting for different surface forms of the word

This this metric could assessed as well as Bleu and called Keyterm Accuracy. It would be:

KeytermAccuracy = (The number of times the keyterm occurs in the correct verse the correct number of times) / (The total number of instances of keyterms)
Multiples of the keyterms when there should not be multiples of the keyterm should penalize the metric
A Levenshtein distance should be applied if the word does not appear and if one of "sufficient closeness" appears, then it should be considered a proper match.
If there are multiple translations of the same term, any one of the proper translations occurring in the verse should be counted as a proper match.

Potential improvements:

Ways to improve the usage of Keyterms without degrading the Bleu and Chrf++ score include:

Guided decoding and similar methods: Guided Decoding #178
"Enhanced" guided decoding: Guided decoding with "enhanced" keyterms #652
Only including the PN's for the books we are inferencing off of
Only including the PN's for the books we are inferencing off of that are not already in the training data
Including more than just PN's - can we include more terms? Does it make it better?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best usage of Keyterms (Proper Names) #653

Best usage of Keyterms (Proper Names) #653

johnml1135 commented Feb 11, 2025

Best usage of Keyterms (Proper Names) #653

Best usage of Keyterms (Proper Names) #653

Comments

johnml1135 commented Feb 11, 2025

A better Keyterm metric

Potential improvements: