We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://github.com/thunlp/LLaVA-UHD
This method is seemingly on par with or better than LLaVA 1.6 Next, however they opensourced the training code for reproduction.
LLM analysis from Gemini 1.5 pro:
Feature LLaVA-UHD-13B LLaVA-NeXT-7B LLaVA-NeXT-13B LLaVA-NeXT-34B LLaVA 1.5-13B VQAv2 81.7 81.8 (Vicuna) / 82.2 (Mistral) 82.8 83.7 80 GQA 65.2 64.2 (Vicuna) / 64.8 (Mistral) 65.4 67.1 63.3 TextVQA 67.7 64.9 (Vicuna) / 65.7 (Mistral) 67.1 69.5 61.3 ScienceQA 72 70.1 (Vicuna) / 72.8 (Mistral) 73.6 81.8 71.6 VizWiz 56.1 57.6 (Vicuna) / 60.0 (Mistral) 60.5 63.8 53.6 MMU (val) 36.4 35.8 (Vicuna) / 35.3 (Mistral) 36.2 51.1 36.4 MMU (test) 33.6 - - 44.7 33.6 MME 1535 1519 (Vicuna) / 1498 (Mistral) 1575 1631 1531 POPE 89.1 86.5 (Vicuna) / 86.7 (Mistral) 86.2 87.7 85.9 Observations: LLaVA-UHD generally performs better than LLaVA 1.5 across all metrics. LLaVA-NeXT series shows comparable performance to LLaVA-UHD on most tasks, with slight variations depending on the specific model (Vicuna or Mistral). LLaVA-NeXT-34B stands out with significantly higher performance on ScienceQA and MMU tasks.
Observations:
Originally posted by @choyakawa in thunlp/LLaVA-UHD#1 (comment)
The text was updated successfully, but these errors were encountered:
Moreover, the model can be efficiently trained in academic settings, within 23 hours on 8 A100 GPUs (vs. 26 hours of LLaVA-1.5).
Sorry, something went wrong.
This issue was closed because it has been inactive for 14 days since being marked as stale.
No branches or pull requests
https://github.com/thunlp/LLaVA-UHD
This method is seemingly on par with or better than LLaVA 1.6 Next, however they opensourced the training code for reproduction.
Originally posted by @choyakawa in thunlp/LLaVA-UHD#1 (comment)
The text was updated successfully, but these errors were encountered: