feat(dev): Add image upload (vision) support for internal testing #7279

abeatrix · 2025-02-28T09:33:28Z

Building block for #7235

This commit adds support for multi-media content in messages, particularly for image handling with byo dev models from google, including support for uploading images in the chat interface. Users can now upload images when they are using the configured model , and the uploaded images are then displayed as part of the chat messages as at-mention chip.

vision.mov

Key changes:

Added a content field to the Message interface with support for different message part types (text, media, tool)
Refactored Google Gemini message handling to properly process the new content structure
Updated chat client and sanitization logic to handle the new content field
Modified the UI components to conditionally show image upload only for BYOK and Vision models
Added Vision tag detection in the model creation function

Other changes:

Adding a new MediaUploadButton component for handling image uploads.
Adding a new ContextItemMedia type to represent media context items.
Updating the constructGeminiChatMessages function to handle media content.
Updating the renderContextItem function to handle media content.
Updating the ContextItemMentionNode to render media context items.
Updating the HumanMessageEditor to include the MediaUploadButton and handle media uploads.
Adding @google/generative-ai as a dependency.
Updating the Message type to include data and mimeType for media.
Updating the chat-question telemetry event to include media context.
Updating the googleChatClient to use the @google/generative-ai library.

Updated

Enterprise instances with Early Access feature flag enabled (e.g. sg02) could also enable vision models:

Test plan

Steps

For internal users: connected to sg02 instance

For BYOK

Add the following settings in your user settings.json file with your google ai studio key
In the options field, make sure to include "categories.vision" in the model that support "vision/image"
Build from this branch
You should not see the image upload icon when you are not on the cody.dev.models that has the "categories.vision" set

{
    "cody.dev.models": [
        {
            "provider": "google",
            "model": "gemini-2.0-flash-thinking-exp-01-21",
            "apiKey": "$GOOGLE_AI_STUDIO_KEY",
            "inputTokens": 1000000,
            "options": {
                "temperature": 1,
            }
        },
        {
            "provider": "google",
            "model": "gemini-2.0-flash",
            "apiKey": "$GOOGLE_AI_STUDIO_KEY",
            "inputTokens": 1000000,
            "options": {
                "temperature": 1,
                "categories": [
                    "vision" // <- This is required for the image icon to show up and enabled for the model
                ]
            },
        },
}

Take a screen shot of the home page of Cody
Copy and paste the screenshot into the editor
Ask "tell me what do you see"
Verify the answer

When sending to sg

feat(chat): Add image upload support This commit adds support for uploading images in the chat interface. Users can now upload images, which are then displayed as part of the chat messages. The changes include: - Adding a new `MediaUploadButton` component for handling image uploads. - Adding a new `ContextItemMedia` type to represent media context items. - Updating the `constructGeminiChatMessages` function to handle media content. - Updating the `renderContextItem` function to handle media content. - Updating the `ContextItemMentionNode` to render media context items. - Updating the `HumanMessageEditor` to include the `MediaUploadButton` and handle media uploads. - Adding `@google/generative-ai` as a dependency. - Updating the `Message` type to include `data` and `mimeType` for media. - Updating the `chat-question` telemetry event to include media context. - Updating the `googleChatClient` to use the `@google/generative-ai` library. - Removing the `GeminiChatMessage` interface.

Fixes an issue in the Kotlin code generator where property names with underscores were not properly converted to camelCase. For example, 'image_url' in TypeScript was incorrectly converted to 'image-url' in Kotlin when it should be'imageUrl'. The fix adds logic to detect and properly transform snake_case identifiers to camelCase while preserving the original string for serialization purposes. The problem was that when processing field names with underscores like image_url, the code wasn't converting them to camelCase format as is standard in Kotlin. Now, the updated code will: 1. Detect if a field name contains underscores 2. Replace each underscore followed by a character with the uppercase version of that character using a regex 3. For example, image_url will be converted to imageUrl rather than keeping the underscore ## Test Plan This fix ensures that when TypeScript property names with underscores are converted to Kotlin, they'll follow Kotlin naming conventions while still maintaining the original string literal value for serialization. Run `pnpm generate-agent-kotlin-bindings` to confirm all the exisiting behavior works currently, but for new property added with snake_case identifiers are transformed into camelCase correctly while preserving the original string for serialization purposes.

ykdojo · 2025-02-28T19:42:37Z

vscode/webviews/chat/cells/messageCell/human/editor/toolbar/MediaUploadButton.tsx

+                        <Button
+                            variant="ghost"
+                            size="none"
+                            aria-label="Upload images (drag, select, or paste with Cmd+V)"


Will be helpful to mention Ctrl+V for Windows/Linux users (maybe in a subsequent PR)

ykdojo · 2025-02-28T19:42:45Z

vscode/webviews/chat/cells/messageCell/human/editor/toolbar/MediaUploadButton.tsx

+                            <div className="tw-text-center">
+                                <div>Upload images (PNG, JPEG, WEBP, HEIC, HEIF)</div>
+                                <div className="tw-text-sm tw-opacity-75 tw-mt-1">
+                                    Drag & drop, paste Cmd+V, or click to select


Will be helpful to mention Ctrl+V for Windows/Linux users (maybe in a subsequent PR)

abeatrix added 12 commits February 24, 2025 08:32

wip: context item to prompt

1254f46

add content to message

f54fecb

clean up

3e2772f

fix converter

bf84a43

update component

747bc4f

multiple images support

9fb6091

clean up

14a3630

clean up

77cd350

Enable vision for enterprise with hasEarlyAccess flag

8905b37

update tests

fcbf989

abeatrix changed the title ~~feat(dev): Add image upload (vision) support for google provider~~ feat(dev): Add image upload (vision) support for internal testing Feb 28, 2025

abeatrix requested review from olafurpg, ykdojo and a team February 28, 2025 18:57

abeatrix added 2 commits February 28, 2025 11:10

Insert after

fee5390

Add unit test for prompt building with images

6bf5e45

ykdojo reviewed Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dev): Add image upload (vision) support for internal testing #7279

feat(dev): Add image upload (vision) support for internal testing #7279

abeatrix commented Feb 28, 2025 •

edited

Loading

ykdojo Feb 28, 2025

ykdojo Feb 28, 2025

feat(dev): Add image upload (vision) support for internal testing #7279

Are you sure you want to change the base?

feat(dev): Add image upload (vision) support for internal testing #7279

Conversation

abeatrix commented Feb 28, 2025 • edited Loading

Updated

Test plan

ykdojo Feb 28, 2025

Choose a reason for hiding this comment

ykdojo Feb 28, 2025

Choose a reason for hiding this comment

abeatrix commented Feb 28, 2025 •

edited

Loading