Skip to content

Commit

Permalink
fix text
Browse files Browse the repository at this point in the history
  • Loading branch information
HaoZhang534 committed Jun 20, 2024
1 parent a0d0834 commit 766bec8
Showing 1 changed file with 55 additions and 54 deletions.
109 changes: 55 additions & 54 deletions 2024-06-16-llava-next-interleave/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -666,18 +666,23 @@ <h3 id="section-12">M4-Instruct: Training Data</h1>
</div>
<div class="slider-container">
<div class="slider-wrapper" id="sliderWrapper1">

<div class="slider-item">
<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(117, 209, 215, 0.1);">

<h4>Multi-Image Scenarios</h4>

<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
<div >
<img src="/blog/assets/images/llava-m4/data/dog1.png" alt="Image 1" style="width: 50%; height: 100%;">
<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(2, 1fr); gap: 10px;">
<div>
<img src="/blog/assets/images/llava-m4/data/post1.png" alt="Image 1" style="width: 60%; height: auto;">
</div>
<div >
<img src="/blog/assets/images/llava-m4/data/dog2.png" alt="Image 2" style="width: 50%; height: 100%;">
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/post2.png" alt="Image 2" style="width: 60%; height: auto;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/post3.png" alt="Image 2" style="width: 60%; height: auto;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/post4.png" alt="Image 2" style="width: 60%; height: auto;">
</div>


Expand All @@ -692,26 +697,26 @@ <h4>Multi-Image Scenarios</h4>

<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
<font size="0.2">
<h4>Image Edit Instruction Generation</h4>
<p style="text-align: left;">For a pair of real-world or synthetic images, the model generates accurate image editing instruction. Such capabilities can potentially benefit research related to text-to-image synthesis and diffusion models.</p>
<p style="text-align: left;"><b>User</b> Please provide the image edit instruction that can transfrom the source image to the target image.</p>
<p style="text-align: left;"><b>Assistant</b> In Image 1, replace the black and white soccer ball in the puppy's mouth with a sunflower similar to the one in Image 2.</p>
<h4>Social Media Post</h4>
<p style="text-align: left;">Given an idea and a collection of photos, Assistant produces captivating social media content, such as a Twitter post, featuring well-crafted copy.</p>
<p style="text-align: left;"><b>User</b> Help me write a Twitter post considering the following images.</p>
<p style="text-align: left;"><b>Assistant</b> Wandering through the enchanting streets of Milan, where the vibrant pulse of the city is as captivating as its historical architecture. From the mesmerizing ... the gentle glow of candles offers a moment of serenity in this bustling metropolis. #MilanDiaries #TravelMoments #CityLights</p>
</font>


</div>
</div>
</div>
<div class="slider-item">
<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(117, 209, 215, 0.1);">

<h4>Multi-Image Scenarios</h4>

<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/dogs1.png" alt="Image 1" style="width: 100%; height: auto;">
<div >
<img src="/blog/assets/images/llava-m4/data/dog1.png" alt="Image 1" style="width: 50%; height: 100%;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/dogs2.png" alt="Image 2" style="width: 100%; height: auto;">
<div >
<img src="/blog/assets/images/llava-m4/data/dog2.png" alt="Image 2" style="width: 50%; height: 100%;">
</div>


Expand All @@ -726,11 +731,10 @@ <h4>Multi-Image Scenarios</h4>

<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
<font size="0.2">
<h4>Spotting the Difference</h4>
<p style="text-align: left;">
When presented with two or more visually similar images, this task requires identifying and detailing their differences with fine-grained visual analysis.</p>
<p style="text-align: left;"><b>User</b> Given two similar images, please provide a list of their differences.</p>
<p style="text-align: left;"><b>Assistant</b> The differences between the two images are:\n1. In the first image, the dogs are biting a red frisbee, while in the second image, they are biting a stuffed toy.\n2. In the second image, the dog on the left is wearing a blue jacket with a decorative pattern around the collar.</p>
<h4>Image Edit Instruction Generation</h4>
<p style="text-align: left;">For a pair of real-world or synthetic images, the model generates accurate image editing instruction. Such capabilities can potentially benefit research related to text-to-image synthesis and diffusion models.</p>
<p style="text-align: left;"><b>User</b> Please provide the image edit instruction that can transfrom the source image to the target image.</p>
<p style="text-align: left;"><b>Assistant</b> In Image 1, replace the black and white soccer ball in the puppy's mouth with a sunflower similar to the one in Image 2.</p>
</font>


Expand All @@ -741,18 +745,12 @@ <h4>Spotting the Difference</h4>

<h4>Multi-Image Scenarios</h4>

<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(2, 1fr); gap: 10px;">
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/post1.png" alt="Image 1" style="width: 40%; height: auto;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/post2.png" alt="Image 2" style="width: 40%; height: auto;">
</div>
<div style="display: grid; grid-template-columns: repeat(2, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/post3.png" alt="Image 2" style="width: 40%; height: auto;">
<img src="/blog/assets/images/llava-m4/data/dogs1.png" alt="Image 1" style="width: 100%; height: auto;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/post4.png" alt="Image 2" style="width: 40%; height: auto;">
<img src="/blog/assets/images/llava-m4/data/dogs2.png" alt="Image 2" style="width: 100%; height: auto;">
</div>


Expand All @@ -767,15 +765,17 @@ <h4>Multi-Image Scenarios</h4>

<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
<font size="0.2">
<h4>Social Media Post</h4>
<p style="text-align: left;">Given an idea and a collection of photos, Assistant produces captivating social media content, such as a Twitter post, featuring well-crafted copy.</p>
<p style="text-align: left;"><b>User</b> Help me write a Twitter post considering the following images.</p>
<p style="text-align: left;"><b>Assistant</b> Wandering through the enchanting streets of Milan, where the vibrant pulse of the city is as captivating as its historical architecture. From the mesmerizing ... the gentle glow of candles offers a moment of serenity in this bustling metropolis. #MilanDiaries #TravelMoments #CityLights</p>
<h4>Spotting the Difference</h4>
<p style="text-align: left;">
When presented with two or more visually similar images, this task requires identifying and detailing their differences with fine-grained visual analysis.</p>
<p style="text-align: left;"><b>User</b> Given two similar images, please provide a list of their differences.</p>
<p style="text-align: left;"><b>Assistant</b> The differences between the two images are:\n1. In the first image, the dogs are biting a red frisbee, while in the second image, they are biting a stuffed toy.\n2. In the second image, the dog on the left is wearing a blue jacket with a decorative pattern around the collar.</p>
</font>


</div>
</div>
</div>

<div class="slider-item">
<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(117, 209, 215, 0.1);">

Expand Down Expand Up @@ -898,11 +898,16 @@ <h4>Video Detailed Description</h4>

<h4>Multi-View Scenarios</h4>

<div style="display: grid; grid-template-columns: repeat(1, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
<div style="display: grid; grid-template-columns: repeat(3, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/3d_video.gif" alt="Image 1" style="width: 100%; height: auto;">
<img src="/blog/assets/images/llava-m4/data/room1.jpg" alt="Image 1" style="width: 100%; height: auto;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/room2.jpg" alt="Image 2" style="width: 100%; height: auto;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/room3.jpg" alt="Image 2" style="width: 100%; height: auto;">
</div>


</div>

Expand All @@ -915,11 +920,10 @@ <h4>Multi-View Scenarios</h4>
<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
<font size="0.2">
<h4>Embodied VQA</h4>
<!-- <p style="text-align: left;"> This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p> -->
<p style="text-align: left;"><b>User</b> What is on the left of the TV? </p>
<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
<p style="text-align: left;"><b>User</b> How many trash cans can be seen on your right when you enter? </p>
<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
<p style="text-align: left;"> This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p>
<p style="text-align: left;"><b>User</b> Your Main Goal: Place a glass with a fork in it on the table in front of a glass pitcher. Step Details: &lt;image&gt;Step#1: Move forward, turn right at the refrigerator, turn left at the stove and face the sink.
&lt;image&gt;Step#2: Pick up the fork from in the sink. &lt;image&gt; Current Step: </p>
<p style="text-align: left;"><b>Assistant</b> Turn to the left, move forward, turn to the right and face the refrigerator.</p>
</font>


Expand All @@ -930,16 +934,11 @@ <h4>Embodied VQA</h4>

<h4>Multi-View Scenarios</h4>

<div style="display: grid; grid-template-columns: repeat(3, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/room1.jpg" alt="Image 1" style="width: 100%; height: auto;">
</div>
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/room2.jpg" alt="Image 2" style="width: 100%; height: auto;">
</div>
<div style="display: grid; grid-template-columns: repeat(1, 1fr); grid-template-rows: repeat(1, 1fr); gap: 10px;">
<div style="border: 1px solid #000;">
<img src="/blog/assets/images/llava-m4/data/room3.jpg" alt="Image 2" style="width: 100%; height: auto;">
<img src="/blog/assets/images/llava-m4/data/3d_video.gif" alt="Image 1" style="width: 100%; height: auto;">
</div>


</div>

Expand All @@ -952,10 +951,11 @@ <h4>Multi-View Scenarios</h4>
<div style="display: block; padding: 4px 10px; margin: 5px; border-radius: 3px; background-color: rgba(249, 242, 248, 1);">
<font size="0.2">
<h4>Embodied VQA</h4>
<p style="text-align: left;"> This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p>
<p style="text-align: left;"><b>User</b> Your Main Goal: Place a glass with a fork in it on the table in front of a glass pitcher. Step Details: &lt;image&gt;Step#1: Move forward, turn right at the refrigerator, turn left at the stove and face the sink.
&lt;image&gt;Step#2: Pick up the fork from in the sink. &lt;image&gt; Current Step: </p>
<p style="text-align: left;"><b>Assistant</b> Turn to the left, move forward, turn to the right and face the refrigerator.</p>
<!-- <p style="text-align: left;"> This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for robotics and augmented reality applications.</p> -->
<p style="text-align: left;"><b>User</b> What is on the left of the TV? </p>
<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
<p style="text-align: left;"><b>User</b> How many trash cans can be seen on your right when you enter? </p>
<p style="text-align: left;"><b>Assistant</b> A bicycle on the floor.</p>
</font>


Expand All @@ -964,6 +964,7 @@ <h4>Embodied VQA</h4>




</div>
</div>

Expand Down

0 comments on commit 766bec8

Please sign in to comment.