Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about results on Egoschema #57

Open
pPetrichor opened this issue Apr 30, 2024 · 4 comments
Open

Question about results on Egoschema #57

pPetrichor opened this issue Apr 30, 2024 · 4 comments

Comments

@pPetrichor
Copy link

Hi, thanks for your great work! I have read your MovieChat+ paper and noticed that the Zero-shot QA Evaluation result of MovieChat on EgoSchema is 53.5, while the evaluation result in this CVPR paper(Koala: Key frame-conditioned long video-LLM https://arxiv.org/pdf/2404.04346) is much lower. I guess the possible reason is that the LLM used and the way to evaluate are different, so I would like to confirm what LLM you used for the EgoSchema result(Koala used llama2) and the specific implementation of the LangChain evaluation. Thank you very much!

@Espere-1119-Song
Copy link
Collaborator

For a fair comparison , we use llama.

EgoSchema is a VQA dataset with multiple choice, and it has been proved that when providing the model with choices, the order will effect the answer. We find that with the questoin only (we do not use any other prompt), the answer is more relative to the question and leads to a higher score. Once we get the answer provided by MovieChat, we ask LangChain to calculate the similarity with the multiple choice, and select the most similar one as our prediction.

@pPetrichor
Copy link
Author

Thanks for your kind response. Would you please provide the inference code which "asks LangChain to calculate the similarity with the multiple choices" so we can align the evaluation way better? Thanks a lot!

@Espere-1119-Song
Copy link
Collaborator

Unfortunately, we can't provided you with the code directly. For the evaluation code with LangChain, you can refer to https://python.langchain.com.cn/docs/modules/model_io/prompts/example_selectors/similarity, and we just take the answers as the "Input". Hope this can be helpful to you!

@msra-jqxu
Copy link

msra-jqxu commented Aug 6, 2024

Hi, @Espere-1119-Song, now the openai key is prohibited. So I replace the 'OpenAIEmbeddings()' with Ollama. Could you tell which embedding models you used in your evaluation code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants