fix text

LLaVA-VL · Jun 20, 2024 · 766bec8 · 766bec8
1 parent a0d0834
commit 766bec8
Showing 1 changed file with 55 additions and 54 deletions.
diff --git a/2024-06-16-llava-next-interleave/index.html b/2024-06-16-llava-next-interleave/index.html
@@ -666,18 +666,23 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
 								</div>
 								<div class="slider-container">
 									<div class="slider-wrapper" id="sliderWrapper1">
-
 										<div class="slider-item">
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(117, 209, 215, 0.1);">
 
 												<h4>Multi-Image Scenarios</h4>
 
-												<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
-													<div >
-														<img src="/blog/assets/images/llava-m4/data/dog1.png" alt="Image 1" style="width: 50%; height: 100%;">
+												<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(2, 1fr); gap: 10px;">
+													<div>
+														<img src="/blog/assets/images/llava-m4/data/post1.png" alt="Image 1" style="width: 60%; height: auto;">
 													</div>
-													<div >
-														<img src="/blog/assets/images/llava-m4/data/dog2.png" alt="Image 2" style="width: 50%; height: 100%;">
+													<div style="border: 1px solid #000;">
+														<img src="/blog/assets/images/llava-m4/data/post2.png" alt="Image 2" style="width: 60%; height: auto;">
+													</div>
+													<div style="border: 1px solid #000;">
+														<img src="/blog/assets/images/llava-m4/data/post3.png" alt="Image 2" style="width: 60%; height: auto;">
+													</div>
+													<div style="border: 1px solid #000;">
+														<img src="/blog/assets/images/llava-m4/data/post4.png" alt="Image 2" style="width: 60%; height: auto;">
 													</div>
 
 
@@ -692,26 +697,26 @@ <h4>Multi-Image Scenarios</h4>
 
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
 												<font size="0.2"> 
-												<h4>Image Edit Instruction Generation</h4>
-												<p style="text-align: left;">For a pair of real-world or synthetic images, the model generates accurate image editing instruction. Such capabilities can potentially benefit research related to text-to-image synthesis and diffusion models.</p>
-												<p style="text-align: left;"><b>User</b> Please provide the image edit instruction that can transfrom the source image to the target image.</p>
-													<p style="text-align: left;"><b>Assistant</b> In Image 1, replace the black and white soccer ball in the puppy's mouth with a sunflower similar to the one in Image 2.</p>
+												<h4>Social Media Post</h4>
+												<p style="text-align: left;">Given an idea and a collection of photos, Assistant produces captivating social media content, such as a Twitter post, featuring well-crafted copy.</p>
+												<p style="text-align: left;"><b>User</b> Help me write a Twitter post considering the following images.</p>
+													<p style="text-align: left;"><b>Assistant</b> Wandering through the enchanting streets of Milan, where the vibrant pulse of the city is as captivating as its historical architecture. From the mesmerizing ... the gentle glow of candles offers a moment of serenity in this bustling metropolis. #MilanDiaries #TravelMoments #CityLights</p>
 												</font> 
 
 
 											</div>
-										</div>
+										</div> 
 										<div class="slider-item">
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(117, 209, 215, 0.1);">
 
 												<h4>Multi-Image Scenarios</h4>
 
 												<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
-													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/dogs1.png" alt="Image 1" style="width: 100%; height: auto;">
+													<div >
+														<img src="/blog/assets/images/llava-m4/data/dog1.png" alt="Image 1" style="width: 50%; height: 100%;">
 													</div>
-													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/dogs2.png" alt="Image 2" style="width: 100%; height: auto;">
+													<div >
+														<img src="/blog/assets/images/llava-m4/data/dog2.png" alt="Image 2" style="width: 50%; height: 100%;">
 													</div>
 
 
@@ -726,11 +731,10 @@ <h4>Multi-Image Scenarios</h4>
 
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
 												<font size="0.2"> 
-												<h4>Spotting the Difference</h4>
-												<p style="text-align: left;">
-													When presented with two or more visually similar images, this task requires identifying and detailing their differences with fine-grained visual analysis.</p>
-												<p style="text-align: left;"><b>User</b> Given two similar images, please provide a list of their differences.</p>
-													<p style="text-align: left;"><b>Assistant</b> The differences between the two images are:\n1. In the first image, the dogs are biting a red frisbee, while in the second image, they are biting a stuffed toy.\n2. In the second image, the dog on the left is wearing a blue jacket with a decorative pattern around the collar.</p>
+												<h4>Image Edit Instruction Generation</h4>
+												<p style="text-align: left;">For a pair of real-world or synthetic images, the model generates accurate image editing instruction. Such capabilities can potentially benefit research related to text-to-image synthesis and diffusion models.</p>
+												<p style="text-align: left;"><b>User</b> Please provide the image edit instruction that can transfrom the source image to the target image.</p>
+													<p style="text-align: left;"><b>Assistant</b> In Image 1, replace the black and white soccer ball in the puppy's mouth with a sunflower similar to the one in Image 2.</p>
 												</font> 
 
 
@@ -741,18 +745,12 @@ <h4>Spotting the Difference</h4>
 
 												<h4>Multi-Image Scenarios</h4>
 
-												<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(2, 1fr); gap: 10px;">
-													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/post1.png" alt="Image 1" style="width: 40%; height: auto;">
-													</div>
-													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/post2.png" alt="Image 2" style="width: 40%; height: auto;">
-													</div>
+												<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
 													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/post3.png" alt="Image 2" style="width: 40%; height: auto;">
+														<img src="/blog/assets/images/llava-m4/data/dogs1.png" alt="Image 1" style="width: 100%; height: auto;">
 													</div>
 													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/post4.png" alt="Image 2" style="width: 40%; height: auto;">
+														<img src="/blog/assets/images/llava-m4/data/dogs2.png" alt="Image 2" style="width: 100%; height: auto;">
 													</div>
 
 
@@ -767,15 +765,17 @@ <h4>Multi-Image Scenarios</h4>
 
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
 												<font size="0.2"> 
-												<h4>Social Media Post</h4>
-												<p style="text-align: left;">Given an idea and a collection of photos, Assistant produces captivating social media content, such as a Twitter post, featuring well-crafted copy.</p>
-												<p style="text-align: left;"><b>User</b> Help me write a Twitter post considering the following images.</p>
-													<p style="text-align: left;"><b>Assistant</b> Wandering through the enchanting streets of Milan, where the vibrant pulse of the city is as captivating as its historical architecture. From the mesmerizing ... the gentle glow of candles offers a moment of serenity in this bustling metropolis. #MilanDiaries #TravelMoments #CityLights</p>
+												<h4>Spotting the Difference</h4>
+												<p style="text-align: left;">
+													When presented with two or more visually similar images, this task requires identifying and detailing their differences with fine-grained visual analysis.</p>
+												<p style="text-align: left;"><b>User</b> Given two similar images, please provide a list of their differences.</p>
+													<p style="text-align: left;"><b>Assistant</b> The differences between the two images are:\n1. In the first image, the dogs are biting a red frisbee, while in the second image, they are biting a stuffed toy.\n2. In the second image, the dog on the left is wearing a blue jacket with a decorative pattern around the collar.</p>
 												</font> 
 
 
 											</div>
-										</div> 
+										</div>
+
 										<div class="slider-item">
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(117, 209, 215, 0.1);">
 
@@ -898,11 +898,16 @@ <h4>Video Detailed Description</h4>
 
 												<h4>Multi-View Scenarios</h4>
 
-												<div style="display: grid; grid-template-columns: repeat(1, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
+												<div style="display: grid; grid-template-columns: repeat(3, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
 													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/3d_video.gif" alt="Image 1" style="width: 100%; height: auto;">
+														<img src="/blog/assets/images/llava-m4/data/room1.jpg" alt="Image 1" style="width: 100%; height: auto;">
+													</div>
+													<div style="border: 1px solid #000;">
+														<img src="/blog/assets/images/llava-m4/data/room2.jpg" alt="Image 2" style="width: 100%; height: auto;">
+													</div>
+													<div style="border: 1px solid #000;">
+														<img src="/blog/assets/images/llava-m4/data/room3.jpg" alt="Image 2" style="width: 100%; height: auto;">
 													</div>
-
 
 												</div>
 
@@ -915,11 +920,10 @@ <h4>Multi-View Scenarios</h4>
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
 												<font size="0.2"> 
 												<h4>Embodied VQA</h4>
-												<!-- <p style="text-align: left;">    This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p> -->
-												<p style="text-align: left;"><b>User</b>  What is on the left of the TV? </p>
-													<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
-													<p style="text-align: left;"><b>User</b>  How many trash cans can be seen on your right when you enter? </p>
-													<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
+												<p style="text-align: left;">    This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p>
+												<p style="text-align: left;"><b>User</b> Your Main Goal:  Place a glass with a fork in it on the table in front of a glass pitcher.  Step Details: &lt;image&gt;Step#1: Move forward, turn right at the refrigerator, turn left at the stove and face the sink.
+													&lt;image&gt;Step#2: Pick up the fork from in the sink. &lt;image&gt; Current Step: </p>
+													<p style="text-align: left;"><b>Assistant</b> Turn to the left, move forward, turn to the right and face the refrigerator.</p>
 												</font> 
 
 
@@ -930,16 +934,11 @@ <h4>Embodied VQA</h4>
 
 												<h4>Multi-View Scenarios</h4>
 
-												<div style="display: grid; grid-template-columns: repeat(3, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
-													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/room1.jpg" alt="Image 1" style="width: 100%; height: auto;">
-													</div>
-													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/room2.jpg" alt="Image 2" style="width: 100%; height: auto;">
-													</div>
+												<div style="display: grid; grid-template-columns: repeat(1, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
 													<div style="border: 1px solid #000;">
-														<img src="/blog/assets/images/llava-m4/data/room3.jpg" alt="Image 2" style="width: 100%; height: auto;">
+														<img src="/blog/assets/images/llava-m4/data/3d_video.gif" alt="Image 1" style="width: 100%; height: auto;">
 													</div>
+
 
 												</div>
 
@@ -952,10 +951,11 @@ <h4>Multi-View Scenarios</h4>
 											<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
 												<font size="0.2"> 
 												<h4>Embodied VQA</h4>
-												<p style="text-align: left;">    This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p>
-												<p style="text-align: left;"><b>User</b> Your Main Goal:  Place a glass with a fork in it on the table in front of a glass pitcher.  Step Details: &lt;image&gt;Step#1: Move forward, turn right at the refrigerator, turn left at the stove and face the sink.
-													&lt;image&gt;Step#2: Pick up the fork from in the sink. &lt;image&gt; Current Step: </p>
-													<p style="text-align: left;"><b>Assistant</b> Turn to the left, move forward, turn to the right and face the refrigerator.</p>
+												<!-- <p style="text-align: left;">    This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p> -->
+												<p style="text-align: left;"><b>User</b>  What is on the left of the TV? </p>
+													<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
+													<p style="text-align: left;"><b>User</b>  How many trash cans can be seen on your right when you enter? </p>
+													<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
 												</font> 
 
 
@@ -964,6 +964,7 @@ <h4>Embodied VQA</h4>
 
 
 
+
 									</div>
 								</div>