Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: support image_embeds in openai api as well #13540

Open
1 task done
gyin94 opened this issue Feb 19, 2025 · 11 comments · May be fixed by #13955
Open
1 task done

[Feature]: support image_embeds in openai api as well #13540

gyin94 opened this issue Feb 19, 2025 · 11 comments · May be fixed by #13955
Labels
feature request New feature or request

Comments

@gyin94
Copy link

gyin94 commented Feb 19, 2025

🚀 The feature, motivation and pitch

would it be possible to support image_embeds in openai protocol api as well? And prefix-caching shall be supported via following proposal. thanks.

So users can pass

{
  "type": "image_url",
  "image_url":  {
      "image_url": {"url": f"data:image/embeds;base64,{base64_image}"},
  }
}

use base64 or other more efficient compression method?

import base64
import numpy as np

def encode_base64(arr):
    return base64.b64encode(arr.astype(np.float32).tobytes()).decode('utf-8')

def decode_base64(encoded_str, shape):
    decoded = base64.b64decode(encoded_str)
    return np.frombuffer(decoded, dtype=np.float32).reshape(shape)

cc @youkaichao

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@gyin94 gyin94 added the feature request New feature or request label Feb 19, 2025
@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 19, 2025

You should be able to do this after #13533

@DarkLight1337
Copy link
Member

Actually, never mind. image_embeds aren't being passed via mm_processor_kwargs.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 19, 2025

Prefix caching is already done automatically in V1. Is there a particular benefit of sending the embeddings over HTTP?

@gyin94
Copy link
Author

gyin94 commented Feb 19, 2025

it is super useful for us so that we can experiment image embeddings via other inference engine or service without porting any image encoder implementation in vllm.

Yes. I think if we can send it via image_url. The current vllm engine can also do prefix-caching. Based on my understanding, we just need to a decode processor and pass them as image_embeds to the model forward.

"image_url": {"url": f"data:image/embeds;base64,{base64_image_embeds}"},

@ywang96
Copy link
Member

ywang96 commented Feb 19, 2025

I don't think this is an unreasonable ask for experimentation but at the same time I'm not sure if it's a good idea to deviate too much from the standard OpenAI API on our frontend server, so this is probably something we can add but not recommend.

IMHO if you indeed want to send in multimodal embeddings as input (which makes a lot of sense for a large scale deployment where you want to process/generate embeddings separately), it's probably better to build your own API server on top of AsyncLLM, and we can indeed open up an interface for you to pass in your own hashes/identifer of embeddings.

@gyin94
Copy link
Author

gyin94 commented Feb 19, 2025

@ywang96 thanks for your reply. But creating another api seems to be a bit of duplication. And I also observed that openai/protocol.py from vllm supports non standard openai arguments as well via extra_body field. This means openai api parameters are only a subset of vllm openai api parameters.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 20, 2025

That's true, but what you're proposing is passing the embeddings directly to the url inside messages which is part of OpenAI API. Personally, I'm fine with extending the OpenAI API, but to avoid complicating the logic, I would add a new message type (e.g. image_embeds instead of image_url).

@chaunceyjiang
Copy link
Contributor

I would add a new message type (e.g. image_embeds instead of image_url).

Hi, @DarkLight1337 Can i pick up this issue?
I understand that we should implement similar functionality by mimicking the implementation logic of the "type": "image_url" approach. right?

video_messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {
            "type": "image_embeds",
            "image_embeds": "data:image/embeds;base64,{base64_image}",
        }],
    },
]

@DarkLight1337
Copy link
Member

Hi, @DarkLight1337 Can i pick up this issue?

Sure, thanks!

I understand that we should implement similar functionality by mimicking the implementation logic of the "type": "image_url" approach. right?

Yes, but I guess we should just pass the embeddings directly instead of having to parse it out of the URL

@gyin94
Copy link
Author

gyin94 commented Feb 21, 2025

@chaunceyjiang @DarkLight1337 thanks a lot

@chaunceyjiang chaunceyjiang linked a pull request Feb 28, 2025 that will close this issue
@chaunceyjiang
Copy link
Contributor

@gyin94 @DarkLight1337 Hi,

I have submitted a PR #13955. To be honest, I don't know if it meets your needs. @gyin94 Could you give me an example code for using openAI? For example, using this image https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg

My implementation is based on this.
https://docs.vllm.ai/en/latest/serving/multimodal_inputs.html#embedding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants