Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dev): Add image upload (vision) support for internal testing #7279

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

abeatrix
Copy link
Contributor

@abeatrix abeatrix commented Feb 28, 2025

Building block for #7235

This commit adds support for multi-media content in messages, particularly for image handling with byo dev models from google, including support for uploading images in the chat interface. Users can now upload images when they are using the configured model , and the uploaded images are then displayed as part of the chat messages as at-mention chip.

vision.mov

Key changes:

  1. Added a content field to the Message interface with support for different message part types (text, media, tool)
  2. Refactored Google Gemini message handling to properly process the new content structure
  3. Updated chat client and sanitization logic to handle the new content field
  4. Modified the UI components to conditionally show image upload only for BYOK and Vision models
  5. Added Vision tag detection in the model creation function

Other changes:

  • Adding a new MediaUploadButton component for handling image uploads.
  • Adding a new ContextItemMedia type to represent media context items.
  • Updating the constructGeminiChatMessages function to handle media content.
  • Updating the renderContextItem function to handle media content.
  • Updating the ContextItemMentionNode to render media context items.
  • Updating the HumanMessageEditor to include the MediaUploadButton and handle media uploads.
  • Adding @google/generative-ai as a dependency.
  • Updating the Message type to include data and mimeType for media.
  • Updating the chat-question telemetry event to include media context.
  • Updating the googleChatClient to use the @google/generative-ai library.

Updated

Enterprise instances with Early Access feature flag enabled (e.g. sg02) could also enable vision models:

image

Test plan

Steps

For internal users: connected to sg02 instance

For BYOK

  1. Add the following settings in your user settings.json file with your google ai studio key
  2. In the options field, make sure to include "categories.vision" in the model that support "vision/image"
  3. Build from this branch
  4. You should not see the image upload icon when you are not on the cody.dev.models that has the "categories.vision" set
{
    "cody.dev.models": [
        {
            "provider": "google",
            "model": "gemini-2.0-flash-thinking-exp-01-21",
            "apiKey": "$GOOGLE_AI_STUDIO_KEY",
            "inputTokens": 1000000,
            "options": {
                "temperature": 1,
            }
        },
        {
            "provider": "google",
            "model": "gemini-2.0-flash",
            "apiKey": "$GOOGLE_AI_STUDIO_KEY",
            "inputTokens": 1000000,
            "options": {
                "temperature": 1,
                "categories": [
                    "vision" // <- This is required for the image icon to show up and enabled for the model
                ]
            },
        },
}
image
  1. Take a screen shot of the home page of Cody
  2. Copy and paste the screenshot into the editor
  3. Ask "tell me what do you see"
  4. Verify the answer
image

When sending to sg

image

feat(chat): Add image upload support

This commit adds support for uploading images in the chat interface. Users can now upload images, which are then displayed as part of the chat messages.

The changes include:

-   Adding a new `MediaUploadButton` component for handling image uploads.
-   Adding a new `ContextItemMedia` type to represent media context items.
-   Updating the `constructGeminiChatMessages` function to handle media content.
-   Updating the `renderContextItem` function to handle media content.
-   Updating the `ContextItemMentionNode` to render media context items.
-   Updating the `HumanMessageEditor` to include the `MediaUploadButton` and handle media uploads.
-   Adding `@google/generative-ai` as a dependency.
-   Updating the `Message` type to include `data` and `mimeType` for media.
-   Updating the `chat-question` telemetry event to include media context.
-   Updating the `googleChatClient` to use the `@google/generative-ai` library.
-   Removing the `GeminiChatMessage` interface.
Fixes an issue in the Kotlin code generator where property names with underscores were not properly converted to camelCase. For example, 'image_url' in TypeScript was incorrectly converted to 'image-url' in Kotlin when it should be'imageUrl'. The fix adds logic to detect and properly transform snake_case identifiers to camelCase while preserving the original string for serialization purposes.

The problem was that when processing field names with underscores like image_url, the code wasn't converting them to camelCase format as is standard in Kotlin. Now, the updated code will:

  1. Detect if a field name contains underscores
  2. Replace each underscore followed by a character with the uppercase version of that character using a regex
  3. For example, image_url will be converted to imageUrl rather than keeping the underscore

## Test Plan

This fix ensures that when TypeScript property names with underscores are converted to Kotlin, they'll follow Kotlin naming conventions while still maintaining the original string literal value for serialization.

Run `pnpm generate-agent-kotlin-bindings` to confirm all the exisiting behavior works currently, but for new property added with snake_case identifiers are transformed into camelCase correctly while preserving the original string for serialization purposes.
@abeatrix abeatrix changed the title feat(dev): Add image upload (vision) support for google provider feat(dev): Add image upload (vision) support for internal testing Feb 28, 2025
@abeatrix abeatrix requested review from olafurpg, ykdojo and a team February 28, 2025 18:57
<Button
variant="ghost"
size="none"
aria-label="Upload images (drag, select, or paste with Cmd+V)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be helpful to mention Ctrl+V for Windows/Linux users (maybe in a subsequent PR)

<div className="tw-text-center">
<div>Upload images (PNG, JPEG, WEBP, HEIC, HEIF)</div>
<div className="tw-text-sm tw-opacity-75 tw-mt-1">
Drag & drop, paste Cmd+V, or click to select
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be helpful to mention Ctrl+V for Windows/Linux users (maybe in a subsequent PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants