Skip to content

Commit

Permalink
fix text
Browse files Browse the repository at this point in the history
  • Loading branch information
HaoZhang534 committed Jun 20, 2024
1 parent f963a6f commit 3c19bd3
Showing 1 changed file with 24 additions and 22 deletions.
46 changes: 24 additions & 22 deletions 2024-06-16-llava-next-interleave/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -390,11 +390,11 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-klud{border-color:inherit;color:#3F3F3F;font-style:italic;font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-klud{border-color:black;color:#3F3F3F;font-style:italic;font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-z0de{color:#3F3F3F;font-style:italic;font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-r3m5{background-color:#FAF1D1;border-color:inherit;color:#3F3F3F;text-align:center;vertical-align:middle}
.tg .tg-yal5{background-color:#F2F3F5;border-color:inherit;color:#3F3F3F;font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-pvhc{background-color:#F8F9FA;border-color:inherit;color:#3F3F3F;text-align:center;vertical-align:middle}
.tg .tg-r3m5{background-color:#FAF1D1;border-color:black;color:#3F3F3F;text-align:center;vertical-align:middle}
.tg .tg-yal5{background-color:#F2F3F5;border-color:black;color:#3F3F3F;font-weight:bold;text-align:center;vertical-align:middle}
.tg .tg-pvhc{background-color:#F8F9FA;border-color:black;color:#3F3F3F;text-align:center;vertical-align:middle}
.tg .tg-ofrq{background-color:#FAF1D1;color:#3F3F3F;text-align:center;vertical-align:middle}
.tg .tg-mmbt{background-color:#F8F9FA;color:#3F3F3F;text-align:center;vertical-align:middle}
</style>
Expand All @@ -412,7 +412,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-klud" colspan="4"><span style="font-weight:bold;font-style:italic;color:#3F3F3F">Multi-image Scenarios</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-pvhc" rowspan="6"><span style="color:#3F3F3F;background-color:#F8F9FA">Spot the Difference</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(63.9K)</span></td>
<td class="tg-pvhc" rowspan="6"><span style="color:#3F3F3F;background-color:#F8F9FA">Spot the Difference</span><span style="color:#3F3F3F;background-color:#F8F9FA">(63.9K)</span></td>
<td class="tg-r3m5"><span style="color:#3F3F3F;background-color:#FAF1D1">Real-world Difference</span></td>
<td class="tg-r3m5"><span style="color:#3F3F3F;background-color:#FAF1D1">Realistic</span></td>
<td class="tg-r3m5"><span style="color:#3F3F3F;background-color:#FAF1D1">7.5K</span></td>
Expand Down Expand Up @@ -443,7 +443,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">3.9K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Image Edit Instruction</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(67.7K)</span></td>
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Image Edit Instruction</span><span style="color:#3F3F3F;background-color:#F8F9FA">(67.7K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">HQ-Edit</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Sythentic</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">50K</span></td>
Expand All @@ -459,7 +459,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">3.5K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="4"><span style="color:#3F3F3F;background-color:#F8F9FA">Visual Story Telling</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(66.9K)</span></td>
<td class="tg-mmbt" rowspan="4"><span style="color:#3F3F3F;background-color:#F8F9FA">Visual Story Telling</span><span style="color:#3F3F3F;background-color:#F8F9FA">(66.9K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">AESOP</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Cartoon</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">6.9K</span></td>
Expand All @@ -480,7 +480,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">26K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="2"><span style="color:#3F3F3F;background-color:#F8F9FA">Visual Cloze</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(14.6K)</span></td>
<td class="tg-mmbt" rowspan="2"><span style="color:#3F3F3F;background-color:#F8F9FA">Visual Cloze</span><span style="color:#3F3F3F;background-color:#F8F9FA">(14.6K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">COMICS_Dialogue</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Cartoon</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">5.9K</span></td>
Expand All @@ -491,7 +491,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">8.7K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="4"><span style="color:#3F3F3F;background-color:#F8F9FA">Text-rich VQA</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(21.4K)</span></td>
<td class="tg-mmbt" rowspan="4"><span style="color:#3F3F3F;background-color:#F8F9FA">Text-rich VQA</span><span style="color:#3F3F3F;background-color:#F8F9FA">(21.4K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">WebQA</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Webpage</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">9.3K</span></td>
Expand All @@ -512,7 +512,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">1.9K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="7"><span style="color:#3F3F3F;background-color:#F8F9FA">Multi-image VQA</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(174.5K)</span></td>
<td class="tg-mmbt" rowspan="7"><span style="color:#3F3F3F;background-color:#F8F9FA">Multi-image VQA</span><span style="color:#3F3F3F;background-color:#F8F9FA">(174.5K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">NLVR2</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Realistic</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">83K</span></td>
Expand Down Expand Up @@ -548,7 +548,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">64K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="2"><span style="color:#3F3F3F;background-color:#F8F9FA">Low-level Comparison</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(66K)</span></td>
<td class="tg-mmbt" rowspan="2"><span style="color:#3F3F3F;background-color:#F8F9FA">Low-level Comparison</span><span style="color:#3F3F3F;background-color:#F8F9FA">(66K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Coinstruct</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Low-level</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">50K</span></td>
Expand Down Expand Up @@ -595,7 +595,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-z0de" colspan="4"><span style="font-weight:bold;font-style:italic;color:#3F3F3F">Multi-frame (Video) Scenarios</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Video VQA</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(247K)</span></td>
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Video VQA</span><span style="color:#3F3F3F;background-color:#F8F9FA">(247K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">NExT-QA</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">General</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">4K</span></td>
Expand All @@ -620,7 +620,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-z0de" colspan="4"><span style="font-weight:bold;font-style:italic;color:#3F3F3F">Multi-view (3D) Scenarios</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Scene VQA</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(51K)</span></td>
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Scene VQA</span><span style="color:#3F3F3F;background-color:#F8F9FA">(51K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Nuscenes</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Outdoor</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">10K</span></td>
Expand All @@ -636,7 +636,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">16K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Embodied VQA</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(48.5K)</span></td>
<td class="tg-mmbt" rowspan="3"><span style="color:#3F3F3F;background-color:#F8F9FA">Embodied VQA</span><span style="color:#3F3F3F;background-color:#F8F9FA">(48.5K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">ALFRED</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Indoor Synthetic</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">23K</span></td>
Expand All @@ -655,7 +655,7 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
<td class="tg-z0de" colspan="4"><span style="font-weight:bold;font-style:italic;color:#3F3F3F">Single-image Scenarios</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Single-image Tasks</span><br><span style="color:#3F3F3F;background-color:#F8F9FA">(307K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Single-image Tasks</span><span style="color:#3F3F3F;background-color:#F8F9FA">(307K)</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">Randomly sampling 40% SFT data of LLaVA-NeXT</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">General</span></td>
<td class="tg-mmbt"><span style="color:#3F3F3F;background-color:#F8F9FA">307K</span></td>
Expand Down Expand Up @@ -953,7 +953,7 @@ <h3 id="section-14">LLaVA-Interleave Bench</h3>
<td class="tg-z0de" colspan="4"><span style="font-weight:bold;font-style:italic;color:#3F3F3F">In-domain Evaluation - Newly Curated Benchmarks</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-sr0x" rowspan="3"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Spot the Difference</span><br><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.3K)</span></td>
<td class="tg-sr0x" rowspan="3"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Spot the Difference</span><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.3K)</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">Spot-the-Diff</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">Surveilance</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
Expand All @@ -969,7 +969,7 @@ <h3 id="section-14">LLaVA-Interleave Bench</h3>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-sr0x" rowspan="3"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Image Edit Instruction</span><br><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(2K)</span></td>
<td class="tg-sr0x" rowspan="3"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Image Edit Instruction</span><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(2K)</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">HQ-Edit</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">Sythentic</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">1K</span></td>
Expand All @@ -985,7 +985,7 @@ <h3 id="section-14">LLaVA-Interleave Bench</h3>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-sr0x" rowspan="4"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Visual Story Telling</span><br><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.4K)</span></td>
<td class="tg-sr0x" rowspan="4"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Visual Story Telling</span><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.4K)</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">AESOP</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">Cartoon</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
Expand All @@ -1006,13 +1006,13 @@ <h3 id="section-14">LLaVA-Interleave Bench</h3>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-sr0x"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Visual Cloze</span><br><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.1K)</span></td>
<td class="tg-sr0x"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Visual Cloze</span><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.1K)</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">COMICS_Dialogue</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">Cartoon</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-sr0x" rowspan="4"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Text-rich VQA</span><br><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.4K)</span></td>
<td class="tg-sr0x" rowspan="4"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Text-rich VQA</span><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.4K)</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">WebQA</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">Webpage</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
Expand All @@ -1033,7 +1033,7 @@ <h3 id="section-14">LLaVA-Interleave Bench</h3>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
</tr>
<tr style="line-height: 5px;">
<td class="tg-sr0x" rowspan="4"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Multi-image VQA</span><br><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.4K)</span></td>
<td class="tg-sr0x" rowspan="4"><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">Multi-image VQA</span><span style="font-style:italic;color:#3F3F3F;background-color:#F0FBEF">(0.4K)</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">MIT-States_StateCoherence</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">General</span></td>
<td class="tg-k8yq"><span style="color:#3F3F3F;background-color:#F0FBEF">0.1K</span></td>
Expand Down Expand Up @@ -1094,7 +1094,7 @@ <h3 id="section-14">LLaVA-Interleave Bench</h3>
</tr>
<tr style="line-height: 5px;">
<td class="tg-1jwe"><span style="font-style:italic;color:#3F3F3F;background-color:#F6F1FE">Mantis-Eval (0.2K)</span></td>
<td class="tg-t1u7"><span style="color:#3F3F3F;background-color:#F6F1FE">Mantis-Eval</span><br></td>
<td class="tg-t1u7"><span style="color:#3F3F3F;background-color:#F6F1FE">Mantis-Eval</span></td>
<td class="tg-t1u7"><span style="color:#3F3F3F;background-color:#F6F1FE">General</span></td>
<td class="tg-t1u7"><span style="color:#3F3F3F;background-color:#F6F1FE">0.2K</span></td>
</tr>
Expand Down Expand Up @@ -1781,6 +1781,8 @@ <h3 id="mp-eval">Single-image Evaluation (multi-patch)</h3>
</p>
</details>
</br>
</br>

<h2 id="section-3-emerging-capabilities">Section 3 - Emerging Capabilities</h2>
<h3 id="section-31">Task Transfer between Single-Image and Multi-Image </h3>
</p>
Expand Down

0 comments on commit 3c19bd3

Please sign in to comment.