Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logical behind score() in bart_score.py #47

Open
zwxu064 opened this issue Dec 9, 2024 · 2 comments
Open

logical behind score() in bart_score.py #47

zwxu064 opened this issue Dec 9, 2024 · 2 comments

Comments

@zwxu064
Copy link

zwxu064 commented Dec 9, 2024

Hi, I am wondering how is the logical reasoning of this model. If my source and target are
source: ["I have one sister and one brother"]
target: [["I have no siblings", "I have 2 siblings", "I have three siblings", "my parents have 3 kids"]]

the BARTScores: [[-3.561114549636841], [-3.4354922771453857], [-3.130859136581421], [-4.85980749130249]]

corresponding to the quality "I have three siblings" > "I have 2 siblings" > "I have no siblings" > "my parents have 3 kids".
Apparently, this is wrong.

Thanks.

@yyy-Apple
Copy link
Collaborator

I did reproduce this with the model trained on parabank2, altho the numbers are slightly different to yours. When changing "I have 2 siblings" to "I have two siblings", the model is able to score this one much higher than others. This suggests that the model hasn't seen many numerical values during its training process, making it unfamiliar with them.

Regarding the example "my parents have 3 kids", this requires a higher level of reasoning and cannot be considered a direct paraphrase of "I have one sister and one brother", even though they both refer to the same fact. Therefore the model may struggle on this.

@zwxu064
Copy link
Author

zwxu064 commented Dec 14, 2024

I agree, that means BARTScore is not intrinsically good at reasoning, even if the source context is given, the score for the target sentence sometimes has a discrepancy to the real semantic meaning, such as the semantic of 2 and two are the same, but only two is recognised because no Arabic numbers are used in the source context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants