You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is to research the best usage of Proper Names (and keyterms more broadly) into the AI drafting flow and determine the best way to use it. Currently, all Proper Names (type PN) are included in training with the same weights, no matter how much or little other training data is being used. A few points of research could be:
A better Keyterm metric
We need to be able to easily determine the effectiveness of different means of adding keyterms and how well they are used. These metrics should account for:
That the proper keyterms are used in the proper places
That the Bleu score has not been degraded
If possible, accounting for different surface forms of the word
This this metric could assessed as well as Bleu and called Keyterm Accuracy. It would be:
KeytermAccuracy = (The number of times the keyterm occurs in the correct verse the correct number of times) / (The total number of instances of keyterms)
Multiples of the keyterms when there should not be multiples of the keyterm should penalize the metric
A Levenshtein distance should be applied if the word does not appear and if one of "sufficient closeness" appears, then it should be considered a proper match.
If there are multiple translations of the same term, any one of the proper translations occurring in the verse should be counted as a proper match.
Potential improvements:
Ways to improve the usage of Keyterms without degrading the Bleu and Chrf++ score include:
This issue is to research the best usage of Proper Names (and keyterms more broadly) into the AI drafting flow and determine the best way to use it. Currently, all Proper Names (type PN) are included in training with the same weights, no matter how much or little other training data is being used. A few points of research could be:
A better Keyterm metric
We need to be able to easily determine the effectiveness of different means of adding keyterms and how well they are used. These metrics should account for:
This this metric could assessed as well as Bleu and called Keyterm Accuracy. It would be:
KeytermAccuracy = (The number of times the keyterm occurs in the correct verse the correct number of times) / (The total number of instances of keyterms)
Potential improvements:
Ways to improve the usage of Keyterms without degrading the Bleu and Chrf++ score include:
The text was updated successfully, but these errors were encountered: