Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue changing default voice in Gemini2 live api #378

Open
notnotrishi opened this issue Dec 20, 2024 · 4 comments
Open

Issue changing default voice in Gemini2 live api #378

notnotrishi opened this issue Dec 20, 2024 · 4 comments
Assignees
Labels
component:examples Issues/PR referencing examples folder status:triaged Issue/PR triaged to the corresponding sub-team type:bug Something isn't working

Comments

@notnotrishi
Copy link

Description of the bug:

I'm using the code in live_api_starter.py to test out the multimodal live api and build from there. The basic code works. Based on the documentation in https://ai.google.dev/api/multimodal-live#sessions I am attempting to change the default voice but it seems to be not working if I use the format mentioned in the API documentation:

My code snippet:

MODEL = "models/gemini-2.0-flash-exp"

MODE = args.mode

client = genai.Client(http_options={"api_version": "v1alpha"})

CONFIG = {
    "generation_config": {
        "response_modalities": ["AUDIO"],
        "speech_config": {
            "voice_config": {
                "prebuilt_voice_config": {
                    "voice_name": "Charon"
                }
            }
        },
    }
}

Error message:

Traceback (most recent call last):
File "/Users/rishi/Projects/gemini_live/new.py", line 393, in
asyncio.run(main.run())
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/rishi/Projects/gemini_live/new.py", line 362, in run
async with (
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 204, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/genai/live.py", line 626, in connect
self._LiveSetup_to_mldev(model=transformed_model, config=config)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/genai/live.py", line 459, in _LiveSetup_to_mldev
_GenerateContentConfig_to_mldev(
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/genai/models.py", line 894, in _GenerateContentConfig_to_mldev
t.t_speech_config(api_client, getv(from_object, ['speech_config'])),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/genai/_transformers.py", line 301, in t_speech_config
raise ValueError(f'Unsupported speechConfig type: {type(origin)}')
ValueError: Unsupported speechConfig type: <class 'dict'>

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

@gmKeshari gmKeshari added type:bug Something isn't working status:triaged Issue/PR triaged to the corresponding sub-team component:examples Issues/PR referencing examples folder labels Dec 20, 2024
@ThashilNaidoo
Copy link

I had the same issue. Try this instead. It worked for me.

"generation_config": { "response_modalities": ["AUDIO"], "speech_config": "Charon" }

@notnotrishi
Copy link
Author

, "speech_config": "Charon"

I had the same issue. Try this instead. It worked for me.

"generation_config": { "response_modalities": ["AUDIO"], "speech_config": "Charon" }

that works, thanks @ThashilNaidoo !

but i also noticed the report is triaged as a bug so hopefully they can fix the issue and/or clarify which one is the correct usage

@MarkDaoust
Copy link
Contributor

Hi, thanks for reporting this. I just looked into it and this is fixed in the latest release (0.4).

Both methods work now.

@brandonwheat
Copy link

Hi, thanks for reporting this. I just looked into it and this is fixed in the latest release (0.4).

Both methods work now.

I tried both and neither are working for me

Invalid JSON payload received. Unknown name "prebuilt_voice_config " at 'setup.generati; then sent 1007 (invalid frame payload data) Request trace id: 74133fd7e2c07425, Invalid JSON payload received. Unknown name "prebuilt_voice_config " at 'setup.generati

conn = await es.enter_async_context(
connect(
f'wss://{HOST}/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContent?key={API_KEY}')
)
print('')

    initial_request = {
        'setup': {
            'model': MODEL,
            'system_instruction': {
                "parts": [
                    {
                        "text": SYSTEM_MESSAGE
                    }
                ]
        },
            "tools":  {'function_declarations': [pay_bill_tool, get_quote_tool]},
            "generation_config": {
                "response_modalities": ["AUDIO"],
                "speech_config": {
                      "voice_config": {
                        "prebuilt_voice_config ": {
                          "voice_name": "Puck"
                        }
                      }
                }
            }
        },
    }

@MarkDaoust MarkDaoust reopened this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:examples Issues/PR referencing examples folder status:triaged Issue/PR triaged to the corresponding sub-team type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants