Chat voice to voice live with Google's Gemini 2.0 AI with google search capabilities.
scarlett.mp4
- Python 3.8 or higher
- Discord Bot Token
- Free GEMINI API Key: https://aistudio.google.com/
- Discord Server with voice channels enabled
- Clone the repository:
git clone https://github.com/2187Nick/discord-voice-to-voice-gemini
cd discord-voice-to-voice-gemini
- Install required packages:
pip install -r requirements.txt
- Create a
.env
file in the project root with the following variables:
DISCORD_TOKEN=your_discord_bot_token
GEMINI_API_KEY=your_gemini_api_key
- Options:
- Set the voice to use in
main.py
line 22.
voice="aoede"
- Set the persona to use in
main.py
line 23.
persona="Take on the persona of an overly excited motivational speaker"
- Start the bot:
python main.py
- In Discord, join a voice channel and use the following command:
/chat
-
Enable push-to-talk in Discord and hold the key while speaking.
-
Interrupt the response by pressing the key again and start speaking.
/chat
- Initiates a voice chat session with the bot/stop
- Stops the current voice chat session
├── main.py # Bot initialization and command handling
└── src/
├── record.py # Audio processing and speech-to-text conversion
├── stream.py # Custom audio streaming implementation
└── gemini.py # Gemini AI WebSocket client integration
- Uses Discord.py for bot functionality
- Implements custom audio streaming for real-time voice processing
- Uses WebSocket connection to Gemini AI for real-time responses
- Handles both synchronous and asynchronous operations for optimal performance