You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But it seems that when I replace the local int4 path for phi-3-vision with the local int4 path of the 'openbmb/MiniCPM-V-2_6' that I also exported to int4 using the same command to compress phi-3-vision, the inference no longer works. Although I see MiniCPM-V-2_6 is a supported model in the modeling_vision_language.py but the error I am seeing looks like this:
Traceback (most recent call last):
File "/home/plischwe/OVModelForVisualCausalLM_test.py", line 27, in <module>
inputs = processor(prompt, [image], return_tensors="pt")
File "/root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-V-2_6/4719557d673e9e2b4b3f083801626098f51441a8/processing_minicpmv.py", line 67, in __call__
return self._convert_images_texts_to_inputs(image_inputs, text, max_slice_nums=max_slice_nums, use_image_id=use_image_id, max_length=max_length, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-V-2_6/4719557d673e9e2b4b3f083801626098f51441a8/processing_minicpmv.py", line 153, in _convert_images_texts_to_inputs
assert len(image_tags) == len(image_sizes[index])
Shouldn't the plug and play of models be necessary to make the reusability of code easier - or is there something I am missing?
The text was updated successfully, but these errors were encountered:
Error comes not from model class that you reference, it happens on early stage where you preprocess inputs.
this is issue with preprocessing code, not with model class. MiniCPM-2.6V uses different format for image and chat template applying than phi-3-vision and as the result you got assert that number of image tags in tokenized input prompt is different from expected. Unfortunantly we can do nothing with that as it is part of original model code provided by its authors (optimum-intel just reuse the same preprocessing that defined for original model) and it is acceptable that independent models code may have some differences
I recommend to use preprocess_input helper in OVModelForVisualCausalLM for preparing model input instead of manual preprocessing (will work for both phi-3 and minicom models)
model = OVModelForVisualCausalLM.from_pretrained(model_dir, device="CPU", trust_remote_code=True)
inputs = model.preprocess_input(image=image, text="What is unusual on this picture?, procesor=processor, tokenizer=processor.tokenizer, config=model.config)
model.generate(**inputs, **generation_args)
Hi, I followed the tutorial here to convert and run multimodal inference using phi-3-vision: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/phi-3-vision/phi-3-vision.ipynb
The code I put into a python script which looks like this:
But it seems that when I replace the local int4 path for phi-3-vision with the local int4 path of the 'openbmb/MiniCPM-V-2_6' that I also exported to int4 using the same command to compress phi-3-vision, the inference no longer works. Although I see MiniCPM-V-2_6 is a supported model in the modeling_vision_language.py but the error I am seeing looks like this:
Shouldn't the plug and play of models be necessary to make the reusability of code easier - or is there something I am missing?
The text was updated successfully, but these errors were encountered: