[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

Player256 · 2024-12-16T21:15:16Z

This pull request addresses issue #9638 by adding support for the Ovis1.6-Gemma2-9B model.

github-actions · 2024-12-16T21:15:29Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Isotr0py

This model implementation is coupling the image processing and model forwarding...

You can refer to the model implementation in llava.py and phi3v.py when adding model implementation.

vllm/model_executor/models/ovis.py

Swipe4057 · 2025-01-15T14:32:09Z

any news?

Signed-off-by: Player256 <[email protected]>

Player256 · 2025-02-03T12:43:56Z

Hey @Isotr0py could you give this PR a review?

Isotr0py

Although the model implementation becomes better, there are still lots of things needed to be done:

Update the documentation to mention this supported model in docs/source/models/supported_models.md
Add example in examples/offline_inference/vision_language.py, if this model support multi-image inputs, please also update examples/offline_inference/vision_language_multi_image.py
Add model correctness tests in tests/models/decoder_only/vision_language/test_models.py and processor correctness test in tests/models/multimodal/processing/test_common.py
Update tests/models/registry.py with model information.

Isotr0py · 2025-02-04T06:52:35Z

vllm/model_executor/models/ovis.py

+    # def merge_multimodal(
+    #     self,
+    #     text_input_ids: torch.Tensor,
+    #     text_attention_masks: torch.Tensor,
+    #     text_labels: Optional[torch.Tensor],
+    #     pixel_values: List[Optional[torch.Tensor]],
+    #     left_padding: bool = False
+    # ):


Please remove this unused code.

vllm/model_executor/models/ovis.py

Isotr0py · 2025-02-04T07:06:14Z

vllm/model_executor/models/ovis.py

+    @cached_property
+    def sampler(self):
+        if hasattr(self.llm,"sampler"):
+            return self.llm.sampler


Missing fallback?

vllm/transformers_utils/configs/ovis.py

Isotr0py · 2025-02-04T07:12:02Z

Please address pre-commit linting errors as well.

Player256 · 2025-02-04T08:39:00Z

Please address pre-commit linting errors as well.

Thanks @Isotr0py for the review, I'll get back to it.

ismael-dm · 2025-02-24T13:10:18Z

will this PR cover also new Ovis 2 models? https://huggingface.co/collections/AIDC-AI/ovis2-67ab36c7e497429034874464

Signed-off-by: Player256 <[email protected]>

Player256 · 2025-02-27T09:07:31Z

I'll add the tests for it.

Isotr0py · 2025-02-27T09:40:13Z

tests/models/registry.py

@@ -270,6 +270,7 @@ def check_available_online(
                                        trust_remote_code=True),
    "NVLM_D": _HfExamplesInfo("nvidia/NVLM-D-72B",
                              trust_remote_code=True),
+    "OvisForConditionalGeneration": _HfExamplesInfo("AIDC-AI/Ovis1.6-Gemma2-9B",trust_remote_code=True),  # noqa: E501


Suggested change

"OvisForConditionalGeneration": _HfExamplesInfo("AIDC-AI/Ovis1.6-Gemma2-9B",trust_remote_code=True), # noqa: E501

"OvisForConditionalGeneration": _HfExamplesInfo("AIDC-AI/Ovis1.6-Llama3.2-3B",trust_remote_code=True), # noqa: E501

We can use a smaller model for registry and test: AIDC-AI/Ovis1.6-Llama3.2-3B

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 · 2025-02-28T07:38:39Z

vllm/model_executor/models/ovis.py

+    def _get_prompt_replacements(
+        self,
+        mm_items: MultiModalDataItems,
+        hf_processor_mm_kwargs: Mapping[str, Any],
+        out_mm_kwargs: MultiModalKwargs,
+    ) -> list[PromptReplacement]:


We have to update this based on #13964

Signed-off-by: Isotr0py <[email protected]>

Isotr0py

@Player256 I tried this PR, but it doesn't work. I managed to make the model loaded. But it seems that the multimodal processor implementation still can't work.

Isotr0py · 2025-02-28T09:16:20Z

vllm/model_executor/models/ovis.py

+       def get_replacement_ovis(image: PIL.Image.Image):
+           _, image_placeholders = self.preprocess_image(image)
+
+           return image_placeholders


Why do we re-process images here?

Isotr0py · 2025-02-28T09:21:47Z

vllm/model_executor/models/ovis.py

+    def get_image_size_with_most_features(self) -> ImageSize:
+        return ImageSize(height=384,width=384)


Seems that Ovis will use dynamic resize (https://huggingface.co/AIDC-AI/Ovis1.6-Llama3.2-3B/blob/b8d93d7468f47fd803eb26ec2c1bc2d7e5fba60e/modeling_ovis.py#L135-L159), does 384x384 image size really return most image _features from visual tokenizer?

Isotr0py · 2025-02-28T09:39:07Z

vllm/model_executor/models/ovis.py

+        if multimodal_embeddings is not None:
+            input_embeds = merge_multimodal_embeddings(
+            input_ids, input_embeds, multimodal_embeddings,
+            IMAGE_TOKEN_ID)


Seems that image_token_id has been replaced by different placeholder tokens here?

vllm/vllm/model_executor/models/ovis.py

Lines 199 to 226 in f2eca81

# place image placeholders

input_ids = []

pixel_values = []

image_token_indices = [i for i, v in enumerate(raw_input_ids) if v == IMAGE_TOKEN_ID]

last_image_token_index = -1

for i in range(len(image_token_indices)):

head = 0 if i == 0 else image_token_indices[i - 1] + 1

tail = image_token_indices[i]

last_image_token_index = tail

input_ids.extend(raw_input_ids[head:tail])

try:

image = images[i]

raw_pixel_values, image_placeholders = self.preprocess_image(

image, max_partition=max_partition)

except Exception as e:

if propagate_exception:

raise e

logging.exception(e)

raw_pixel_values, image_placeholders = self.visual_tokenizer.mock_input()

input_ids.extend(image_placeholders)

pixel_values.append(raw_pixel_values)

input_ids.extend(raw_input_ids[last_image_token_index + 1:])

# return tensors

input_ids = torch.tensor(input_ids, dtype=torch.long)

pixel_values = torch.cat(pixel_values, dim=0) if len(pixel_values) > 0 else None

return prompt, input_ids, pixel_values

Player256 · 2025-02-28T16:33:25Z

@Player256 I tried this PR, but it doesn't work. I managed to make the model loaded. But it seems that the multimodal processor implementation still can't work.

Hey let me get back to you with a working implementation.

Draft PR

c3242bc

DarkLight1337 mentioned this pull request Dec 17, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

46 tasks

Isotr0py self-assigned this Dec 18, 2024

Player256 and others added 3 commits January 4, 2025 18:56

Merge branch 'vllm-project:main' into ovis

d02fcb9

modified the model code

1c0880a

Merge branch 'ovis' of https://github.com/Player256/vllm into ovis

6bf22e3

Player256 marked this pull request as ready for review January 4, 2025 14:02

Isotr0py reviewed Jan 4, 2025

View reviewed changes

vllm/model_executor/models/ovis.py Outdated Show resolved Hide resolved

Player256 added 2 commits January 7, 2025 23:01

Merge branch 'vllm-project:main' into ovis

0ff4eb4

Merge branch 'vllm-project:main' into ovis

dc2f81f

Player256 and others added 3 commits January 23, 2025 05:25

Merge branch 'vllm-project:main' into ovis

91f2bc4

Decoupled the processing multimodal data from forward

939233f

Signed-off-by: Player256 <[email protected]>

Merge branch 'ovis' of https://github.com/Player256/vllm into ovis

058d179

Isotr0py reviewed Feb 4, 2025

View reviewed changes

MrMegnis mentioned this pull request Feb 7, 2025

Create a set of open VLMs for the service and make a map of their use aimclub/Edulytica#96

Open

Player256 and others added 2 commits February 26, 2025 02:28

Merge branch 'vllm-project:main' into ovis

eb18d75

Created OvisProcessor

2e0ebf3

Signed-off-by: Player256 <[email protected]>

Player256 requested review from DarkLight1337 and ywang96 as code owners February 27, 2025 09:05

mergify bot added the documentation Improvements or additions to documentation label Feb 27, 2025

Player256 marked this pull request as draft February 27, 2025 09:06

Merge branch 'vllm-project:main' into ovis

ecb056b

Isotr0py reviewed Feb 27, 2025

View reviewed changes

fix registry

26071a1

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 reviewed Feb 28, 2025

View reviewed changes

Isotr0py added 5 commits February 28, 2025 15:50

ooops

e6b7c97

Signed-off-by: Isotr0py <[email protected]>

fix

3824472

Signed-off-by: Isotr0py <[email protected]>

make ovis can be initialized

7b07751

Signed-off-by: Isotr0py <[email protected]>

make ovis can be loaded

328273a

Signed-off-by: Isotr0py <[email protected]>

clean up methods

f2eca81

Signed-off-by: Isotr0py <[email protected]>

Isotr0py reviewed Feb 28, 2025

View reviewed changes

Merge branch 'vllm-project:main' into ovis

76cdb0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

Player256 commented Dec 16, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 16, 2024

Isotr0py left a comment

Swipe4057 commented Jan 15, 2025

Player256 commented Feb 3, 2025

Isotr0py left a comment

Isotr0py Feb 4, 2025

Isotr0py Feb 4, 2025

Isotr0py commented Feb 4, 2025

Player256 commented Feb 4, 2025

ismael-dm commented Feb 24, 2025

Player256 commented Feb 27, 2025

Isotr0py Feb 27, 2025

Player256 Feb 27, 2025

DarkLight1337 Feb 28, 2025

Isotr0py left a comment

Isotr0py Feb 28, 2025

Isotr0py Feb 28, 2025

Isotr0py Feb 28, 2025

Player256 commented Feb 28, 2025

	"OvisForConditionalGeneration": _HfExamplesInfo("AIDC-AI/Ovis1.6-Gemma2-9B",trust_remote_code=True), # noqa: E501
	"OvisForConditionalGeneration": _HfExamplesInfo("AIDC-AI/Ovis1.6-Llama3.2-3B",trust_remote_code=True), # noqa: E501

		def get_image_size_with_most_features(self) -> ImageSize:
		return ImageSize(height=384,width=384)

	# place image placeholders
	input_ids = []
	pixel_values = []
	image_token_indices = [i for i, v in enumerate(raw_input_ids) if v == IMAGE_TOKEN_ID]
	last_image_token_index = -1
	for i in range(len(image_token_indices)):
	head = 0 if i == 0 else image_token_indices[i - 1] + 1
	tail = image_token_indices[i]
	last_image_token_index = tail
	input_ids.extend(raw_input_ids[head:tail])
	try:
	image = images[i]
	raw_pixel_values, image_placeholders = self.preprocess_image(
	image, max_partition=max_partition)
	except Exception as e:
	if propagate_exception:
	raise e
	logging.exception(e)
	raw_pixel_values, image_placeholders = self.visual_tokenizer.mock_input()
	input_ids.extend(image_placeholders)
	pixel_values.append(raw_pixel_values)
	input_ids.extend(raw_input_ids[last_image_token_index + 1:])

	# return tensors
	input_ids = torch.tensor(input_ids, dtype=torch.long)
	pixel_values = torch.cat(pixel_values, dim=0) if len(pixel_values) > 0 else None

	return prompt, input_ids, pixel_values

[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

Are you sure you want to change the base?

[Model] Add Support for Ovis1.6-Gemma2-9B Model #11240

Conversation

Player256 commented Dec 16, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 16, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

Swipe4057 commented Jan 15, 2025

Player256 commented Feb 3, 2025

Isotr0py left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Isotr0py commented Feb 4, 2025

Player256 commented Feb 4, 2025

ismael-dm commented Feb 24, 2025

Player256 commented Feb 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Isotr0py left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Player256 commented Feb 28, 2025

Player256 commented Dec 16, 2024 •

edited by github-actions bot

Loading