Question about results on Egoschema #57

pPetrichor · 2024-04-30T13:29:15Z

Hi, thanks for your great work! I have read your MovieChat+ paper and noticed that the Zero-shot QA Evaluation result of MovieChat on EgoSchema is 53.5, while the evaluation result in this CVPR paper(Koala: Key frame-conditioned long video-LLM https://arxiv.org/pdf/2404.04346) is much lower. I guess the possible reason is that the LLM used and the way to evaluate are different, so I would like to confirm what LLM you used for the EgoSchema result(Koala used llama2) and the specific implementation of the LangChain evaluation. Thank you very much!

Espere-1119-Song · 2024-04-30T13:37:58Z

For a fair comparison , we use llama.

EgoSchema is a VQA dataset with multiple choice, and it has been proved that when providing the model with choices, the order will effect the answer. We find that with the questoin only (we do not use any other prompt), the answer is more relative to the question and leads to a higher score. Once we get the answer provided by MovieChat, we ask LangChain to calculate the similarity with the multiple choice, and select the most similar one as our prediction.

pPetrichor · 2024-05-18T10:23:11Z

Thanks for your kind response. Would you please provide the inference code which "asks LangChain to calculate the similarity with the multiple choices" so we can align the evaluation way better? Thanks a lot!

Espere-1119-Song · 2024-05-18T13:18:39Z

Unfortunately, we can't provided you with the code directly. For the evaluation code with LangChain, you can refer to https://python.langchain.com.cn/docs/modules/model_io/prompts/example_selectors/similarity, and we just take the answers as the "Input". Hope this can be helpful to you!

msra-jqxu · 2024-08-06T11:05:14Z

Hi, @Espere-1119-Song, now the openai key is prohibited. So I replace the 'OpenAIEmbeddings()' with Ollama. Could you tell which embedding models you used in your evaluation code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about results on Egoschema #57

Question about results on Egoschema #57

pPetrichor commented Apr 30, 2024

Espere-1119-Song commented Apr 30, 2024

pPetrichor commented May 18, 2024

Espere-1119-Song commented May 18, 2024

msra-jqxu commented Aug 6, 2024 •

edited

Loading

Question about results on Egoschema #57

Question about results on Egoschema #57

Comments

pPetrichor commented Apr 30, 2024

Espere-1119-Song commented Apr 30, 2024

pPetrichor commented May 18, 2024

Espere-1119-Song commented May 18, 2024

msra-jqxu commented Aug 6, 2024 • edited Loading

msra-jqxu commented Aug 6, 2024 •

edited

Loading