-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about results on Egoschema #57
Comments
For a fair comparison , we use llama. EgoSchema is a VQA dataset with multiple choice, and it has been proved that when providing the model with choices, the order will effect the answer. We find that with the questoin only (we do not use any other prompt), the answer is more relative to the question and leads to a higher score. Once we get the answer provided by MovieChat, we ask LangChain to calculate the similarity with the multiple choice, and select the most similar one as our prediction. |
Thanks for your kind response. Would you please provide the inference code which "asks LangChain to calculate the similarity with the multiple choices" so we can align the evaluation way better? Thanks a lot! |
Unfortunately, we can't provided you with the code directly. For the evaluation code with LangChain, you can refer to https://python.langchain.com.cn/docs/modules/model_io/prompts/example_selectors/similarity, and we just take the answers as the "Input". Hope this can be helpful to you! |
Hi, @Espere-1119-Song, now the openai key is prohibited. So I replace the 'OpenAIEmbeddings()' with Ollama. Could you tell which embedding models you used in your evaluation code? |
Hi, thanks for your great work! I have read your MovieChat+ paper and noticed that the Zero-shot QA Evaluation result of MovieChat on EgoSchema is 53.5, while the evaluation result in this CVPR paper(Koala: Key frame-conditioned long video-LLM https://arxiv.org/pdf/2404.04346) is much lower. I guess the possible reason is that the LLM used and the way to evaluate are different, so I would like to confirm what LLM you used for the EgoSchema result(Koala used llama2) and the specific implementation of the LangChain evaluation. Thank you very much!
The text was updated successfully, but these errors were encountered: