Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spearman Corrleations for Table-4 #22

Open
Atharva-Phatak opened this issue Jul 9, 2022 · 3 comments
Open

Spearman Corrleations for Table-4 #22

Atharva-Phatak opened this issue Jul 9, 2022 · 3 comments

Comments

@Atharva-Phatak
Copy link

In Table-4 in the paper, for summEval dataset you have measured COH, FAC, FLU, INFO. I wanted to know which variants of bart-score you used.

From my understanding of the paper,
For factuality(FAC) you must have used BARTScore(s->h) i.e source -> hypothesis.

But i am not clear about FLU, COH and INFO.

If you could please elaborate that will be really helpful.

image

@yyy-Apple
Copy link
Collaborator

yyy-Apple commented Jul 10, 2022

On the SummEval dataset, for FLU, COH and INFO, we also used BARTScore(s->h).

@Atharva-Phatak
Copy link
Author

Atharva-Phatak commented Jul 10, 2022

So what was the reason for using single score (s->h). Does BARTScore holistically measure quality of generated text ?

For example can you report s->h variant of BARTScore and say that overall from the basis of the score, the quality of Text Summary generated by Model A is better than Model B ?

Also how do you decide which BARTScore variant to use for a particular dataset to measure COH, FLU, INFO and FAC ?

Please let me know.

@yyy-Apple
Copy link
Collaborator

yyy-Apple commented Jul 14, 2022

Here are some rules we have followed when deciding which BARTScore variant to use.

  • based on the definition of the evaluation perspective (for example, factuality must rely on the source document.)
  • modalities/languages supported by PLMs (for example, for Data-to-text, we can only use the h<->r due to the different modalities of source and hypothesis)

However, we agree that designing a metric with multiple interpretable dimensions will be a promising future work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants