The OpenAI flavor integrates directly with OpenAI's Realtime API, enabling direct Speech-to-Speech processing for user conversations. This setup allows for real-time interpretation and response generation, streamlining interactions without intermediate steps.
The project uses the native WebSocket connection to push the decapsulated RTP from the user to the OpenAI engine and get the response back. Then, it grabs the response, packs it back by adding the RTP header and streams it back to the user.
It does not have any transcoding capabilities, thus communication is limited to g711 PCMU and PCMA codecs.
It currently uses the gpt-4o-realtime-preview-2024-10-01
model.
The following parameters can be tuned for this engine:
Section | Parameter | Environment | Mandatory | Description | Default |
---|---|---|---|---|---|
openai |
key or openai_key |
OPENAI_API_KEY |
yes | OpenAI API key | not provided |
openai |
model |
OPENAI_API_MODEL |
no | OpenAI Realtime Model used | gpt-4o-realtime-preview-2024-10-01 |
openai |
disable |
OPENAI_DISABLE |
no | Disables the flavor | false |
openai |
voice |
OPENAI_VOICE |
no | Configures the OpenAI voice | alloy |
openai |
instructions |
OPENAI_INSTRUCTIONS |
no | Configures the OpenAI module instructions | default/none |
openai |
welcome_message |
OPENAI_WELCOME_MSG |
no | A welcome message to be played back to the user when the call starts | no message |
openai |
temperature |
OPENAI_TEMPERATURE |
no | Sampling temperature for the model, limited to [0.6, 1.2] |
0.8 |
openai |
max_tokens |
OPENAI_MAX_TOKENS |
no | Configures OpenAI Turn Detection max_response_output_tokens , the maximum number of output tokens for a single assistant response. Possible values are [1, 4096] or inf |
inf |
openai |
turn_detection_type |
OPENAI_TURN_DETECT_TYPE |
no | Configures OpenAI Turn Detection type |
server_vad |
openai |
turn_detection_silence_ms |
OPENAI_TURN_DETECT_SILENCE_MS |
no | Configures OpenAI Turn Detection silence duration ms |
200 |
openai |
turn_detection_threshold |
OPENAI_TURN_DETECT_THRESHOLD |
no | Configures OpenAI Turn Detection threshold |
0.5 |
openai |
turn_detection_prefix_ms |
OPENAI_TURN_DETECT_PREFIX_MS |
no | Configures OpenAI Turn Detection prefix_padding_ms |
300 |