[Feature]: support image_embeds in openai api as well #13540

gyin94 · 2025-02-19T10:11:07Z

🚀 The feature, motivation and pitch

would it be possible to support image_embeds in openai protocol api as well? And prefix-caching shall be supported via following proposal. thanks.

So users can pass

{
  "type": "image_url",
  "image_url":  {
      "image_url": {"url": f"data:image/embeds;base64,{base64_image}"},
  }
}

use base64 or other more efficient compression method?

import base64
import numpy as np

def encode_base64(arr):
    return base64.b64encode(arr.astype(np.float32).tobytes()).decode('utf-8')

def decode_base64(encoded_str, shape):
    decoded = base64.b64decode(encoded_str)
    return np.frombuffer(decoded, dtype=np.float32).reshape(shape)

cc @youkaichao

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2025-02-19T10:15:47Z

~~You should be able to do this after #13533~~

DarkLight1337 · 2025-02-19T10:17:07Z

Actually, never mind. image_embeds aren't being passed via mm_processor_kwargs.

DarkLight1337 · 2025-02-19T10:26:48Z

Prefix caching is already done automatically in V1. Is there a particular benefit of sending the embeddings over HTTP?

gyin94 · 2025-02-19T10:29:11Z

it is super useful for us so that we can experiment image embeddings via other inference engine or service without porting any image encoder implementation in vllm.

Yes. I think if we can send it via image_url. The current vllm engine can also do prefix-caching. Based on my understanding, we just need to a decode processor and pass them as image_embeds to the model forward.

"image_url": {"url": f"data:image/embeds;base64,{base64_image_embeds}"},

ywang96 · 2025-02-19T10:33:37Z

I don't think this is an unreasonable ask for experimentation but at the same time I'm not sure if it's a good idea to deviate too much from the standard OpenAI API on our frontend server, so this is probably something we can add but not recommend.

IMHO if you indeed want to send in multimodal embeddings as input (which makes a lot of sense for a large scale deployment where you want to process/generate embeddings separately), it's probably better to build your own API server on top of AsyncLLM, and we can indeed open up an interface for you to pass in your own hashes/identifer of embeddings.

gyin94 · 2025-02-19T21:23:38Z

@ywang96 thanks for your reply. But creating another api seems to be a bit of duplication. And I also observed that openai/protocol.py from vllm supports non standard openai arguments as well via extra_body field. This means openai api parameters are only a subset of vllm openai api parameters.

DarkLight1337 · 2025-02-20T02:40:03Z

That's true, but what you're proposing is passing the embeddings directly to the url inside messages which is part of OpenAI API. Personally, I'm fine with extending the OpenAI API, but to avoid complicating the logic, I would add a new message type (e.g. image_embeds instead of image_url).

chaunceyjiang · 2025-02-20T14:03:08Z

I would add a new message type (e.g. image_embeds instead of image_url).

Hi， @DarkLight1337 Can i pick up this issue?
I understand that we should implement similar functionality by mimicking the implementation logic of the "type": "image_url" approach. right?

video_messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {
            "type": "image_embeds",
            "image_embeds": "data:image/embeds;base64,{base64_image}",
        }],
    },
]

DarkLight1337 · 2025-02-20T14:06:48Z

Hi， @DarkLight1337 Can i pick up this issue?

Sure, thanks!

I understand that we should implement similar functionality by mimicking the implementation logic of the "type": "image_url" approach. right?

Yes, but I guess we should just pass the embeddings directly instead of having to parse it out of the URL

gyin94 · 2025-02-21T22:12:27Z

@chaunceyjiang @DarkLight1337 thanks a lot

chaunceyjiang · 2025-02-28T03:05:41Z

@gyin94 @DarkLight1337 Hi，

I have submitted a PR #13955. To be honest, I don't know if it meets your needs. @gyin94 Could you give me an example code for using openAI? For example, using this image https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg

My implementation is based on this.
https://docs.vllm.ai/en/latest/serving/multimodal_inputs.html#embedding

gyin94 added the feature request New feature or request label Feb 19, 2025

DarkLight1337 mentioned this issue Feb 19, 2025

[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL #13533

Merged

chaunceyjiang linked a pull request Feb 28, 2025 that will close this issue

[Frontend] support image embeds #13955

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: support image_embeds in openai api as well #13540

[Feature]: support image_embeds in openai api as well #13540

gyin94 commented Feb 19, 2025 •

edited

Loading

DarkLight1337 commented Feb 19, 2025 •

edited

Loading

DarkLight1337 commented Feb 19, 2025

DarkLight1337 commented Feb 19, 2025 •

edited

Loading

gyin94 commented Feb 19, 2025 •

edited

Loading

ywang96 commented Feb 19, 2025 •

edited

Loading

gyin94 commented Feb 19, 2025

DarkLight1337 commented Feb 20, 2025 •

edited

Loading

chaunceyjiang commented Feb 20, 2025

DarkLight1337 commented Feb 20, 2025

gyin94 commented Feb 21, 2025

chaunceyjiang commented Feb 28, 2025

[Feature]: support image_embeds in openai api as well #13540

[Feature]: support image_embeds in openai api as well #13540

Comments

gyin94 commented Feb 19, 2025 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

DarkLight1337 commented Feb 19, 2025 • edited Loading

DarkLight1337 commented Feb 19, 2025

DarkLight1337 commented Feb 19, 2025 • edited Loading

gyin94 commented Feb 19, 2025 • edited Loading

ywang96 commented Feb 19, 2025 • edited Loading

gyin94 commented Feb 19, 2025

DarkLight1337 commented Feb 20, 2025 • edited Loading

chaunceyjiang commented Feb 20, 2025

DarkLight1337 commented Feb 20, 2025

gyin94 commented Feb 21, 2025

chaunceyjiang commented Feb 28, 2025

gyin94 commented Feb 19, 2025 •

edited

Loading

DarkLight1337 commented Feb 19, 2025 •

edited

Loading

DarkLight1337 commented Feb 19, 2025 •

edited

Loading

gyin94 commented Feb 19, 2025 •

edited

Loading

ywang96 commented Feb 19, 2025 •

edited

Loading

DarkLight1337 commented Feb 20, 2025 •

edited

Loading