Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MiniCPM unable to run inference #1165

Open
plischwe opened this issue Feb 17, 2025 · 1 comment
Open

MiniCPM unable to run inference #1165

plischwe opened this issue Feb 17, 2025 · 1 comment

Comments

@plischwe
Copy link

Hi, I followed the tutorial here to convert and run multimodal inference using phi-3-vision: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb

The code I put into a python script which looks like this:

from optimum.intel.openvino import OVModelForVisualCausalLM, OVWeightQuantizationConfig
from PIL import Image
from transformers import AutoProcessor, TextStreamer

model_dir = "benchmarking/multimodal/INT4/phi3-128k-vision"
model_dir = "benchmarking/multimodal/INT4/MiniCPM"
image_path = "cat.png"

model = OVModelForVisualCausalLM.from_pretrained(model_dir, device="CPU", trust_remote_code=True)

image = Image.open(image_path)
print("image size: ", image.size)

messages = [
    {"role": "user", "content": "<|image_1|>\nWhat is unusual on this picture?"},
]

processor = AutoProcessor.from_pretrained(model_dir, trust_remote_code=True)

prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(prompt, [image], return_tensors="pt")

generation_args = {"max_new_tokens": 100, "do_sample": False, "streamer": TextStreamer(processor.tokenizer, skip_prompt=True, skip_special_tokens=True)}

print("Answer:")
generate_ids = model.generate(**inputs, eos_token_id=processor.tokenizer.eos_token_id, **generation_args)

But it seems that when I replace the local int4 path for phi-3-vision with the local int4 path of the 'openbmb/MiniCPM-V-2_6' that I also exported to int4 using the same command to compress phi-3-vision, the inference no longer works. Although I see MiniCPM-V-2_6 is a supported model in the modeling_vision_language.py but the error I am seeing looks like this:

Traceback (most recent call last):
  File "/home/plischwe/OVModelForVisualCausalLM_test.py", line 27, in <module>
    inputs = processor(prompt, [image], return_tensors="pt")
  File "/root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-V-2_6/4719557d673e9e2b4b3f083801626098f51441a8/processing_minicpmv.py", line 67, in __call__
    return self._convert_images_texts_to_inputs(image_inputs, text, max_slice_nums=max_slice_nums, use_image_id=use_image_id, max_length=max_length, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-V-2_6/4719557d673e9e2b4b3f083801626098f51441a8/processing_minicpmv.py", line 153, in _convert_images_texts_to_inputs
    assert len(image_tags) == len(image_sizes[index])

Shouldn't the plug and play of models be necessary to make the reusability of code easier - or is there something I am missing?

@eaidova
Copy link
Collaborator

eaidova commented Feb 21, 2025

Error comes not from model class that you reference, it happens on early stage where you preprocess inputs.

this is issue with preprocessing code, not with model class. MiniCPM-2.6V uses different format for image and chat template applying than phi-3-vision and as the result you got assert that number of image tags in tokenized input prompt is different from expected. Unfortunantly we can do nothing with that as it is part of original model code provided by its authors (optimum-intel just reuse the same preprocessing that defined for original model) and it is acceptable that independent models code may have some differences

I recommend to use preprocess_input helper in OVModelForVisualCausalLM for preparing model input instead of manual preprocessing (will work for both phi-3 and minicom models)

model = OVModelForVisualCausalLM.from_pretrained(model_dir, device="CPU", trust_remote_code=True)
inputs = model.preprocess_input(image=image, text="What is unusual on this picture?, procesor=processor, tokenizer=processor.tokenizer, config=model.config)

model.generate(**inputs, **generation_args)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants