Support LLaVA-UHD #6153

choyakawa · 2024-03-19T07:02:19Z

This method is seemingly on par with or better than LLaVA 1.6 Next, however they opensourced the training code for reproduction.

LLM analysis from Gemini 1.5 pro:

Feature LLaVA-UHD-13B LLaVA-NeXT-7B LLaVA-NeXT-13B LLaVA-NeXT-34B LLaVA 1.5-13B

VQAv2 81.7 81.8 (Vicuna) / 82.2 (Mistral) 82.8 83.7 80

GQA 65.2 64.2 (Vicuna) / 64.8 (Mistral) 65.4 67.1 63.3

TextVQA 67.7 64.9 (Vicuna) / 65.7 (Mistral) 67.1 69.5 61.3

ScienceQA 72 70.1 (Vicuna) / 72.8 (Mistral) 73.6 81.8 71.6

VizWiz 56.1 57.6 (Vicuna) / 60.0 (Mistral) 60.5 63.8 53.6

MMU (val) 36.4 35.8 (Vicuna) / 35.3 (Mistral) 36.2 51.1 36.4

MMU (test) 33.6 - - 44.7 33.6

MME 1535 1519 (Vicuna) / 1498 (Mistral) 1575 1631 1531

POPE 89.1 86.5 (Vicuna) / 86.7 (Mistral) 86.2 87.7 85.9

Observations:

LLaVA-UHD generally performs better than LLaVA 1.5 across all metrics.

LLaVA-NeXT series shows comparable performance to LLaVA-UHD on most tasks, with slight variations depending on the specific model (Vicuna or Mistral).

LLaVA-NeXT-34B stands out with significantly higher performance on ScienceQA and MMU tasks.

Originally posted by @choyakawa in thunlp/LLaVA-UHD#1 (comment)

choyakawa · 2024-03-19T07:04:38Z

Moreover, the model can be efficiently trained in academic settings, within 23 hours on 8 A100 GPUs (vs. 26 hours of LLaVA-1.5).

github-actions · 2024-05-03T01:06:31Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

choyakawa added the enhancement New feature or request label Mar 19, 2024

github-actions bot added the stale label Apr 19, 2024

github-actions bot closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support LLaVA-UHD #6153

Support LLaVA-UHD #6153

choyakawa commented Mar 19, 2024

choyakawa commented Mar 19, 2024

github-actions bot commented May 3, 2024

Support LLaVA-UHD #6153

Support LLaVA-UHD #6153

Comments

choyakawa commented Mar 19, 2024

choyakawa commented Mar 19, 2024

github-actions bot commented May 3, 2024