-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spearman Corrleations for Table-4 #22
Comments
On the SummEval dataset, for FLU, COH and INFO, we also used BARTScore(s->h). |
So what was the reason for using single score (s->h). Does BARTScore holistically measure quality of generated text ? For example can you report s->h variant of BARTScore and say that overall from the basis of the score, the quality of Text Summary generated by Model A is better than Model B ? Also how do you decide which BARTScore variant to use for a particular dataset to measure COH, FLU, INFO and FAC ? Please let me know. |
Here are some rules we have followed when deciding which BARTScore variant to use.
However, we agree that designing a metric with multiple interpretable dimensions will be a promising future work. |
In Table-4 in the paper, for summEval dataset you have measured COH, FAC, FLU, INFO. I wanted to know which variants of bart-score you used.
From my understanding of the paper,
For factuality(FAC) you must have used BARTScore(s->h) i.e source -> hypothesis.
But i am not clear about FLU, COH and INFO.
If you could please elaborate that will be really helpful.
The text was updated successfully, but these errors were encountered: