Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realtime TTS #1187

Open
EtahReign opened this issue Oct 23, 2024 · 5 comments
Open

Realtime TTS #1187

EtahReign opened this issue Oct 23, 2024 · 5 comments

Comments

@EtahReign
Copy link

The Koboldcpp app is amazing. The only issue I see is the TTS occurs after the text is finished which takes forever. Is there a way to have the TTS occur as the text is being outputted to reduce the delay information being outputted?

@LostRuins
Copy link
Owner

Unfortunately this is not possible at this time, since the TTS can only work on the completed text. Perhaps if you disable streaming it might feel better?

@WesleyFister
Copy link

Unfortunately this is not possible at this time, since the TTS can only work on the completed text. Perhaps if you disable streaming it might feel better?

You can break the streamed response into sentences and then run the TTS on each sentence, playing it back to the user. In this case you would only have to wait until the first sentence is created. This is what I do in my speech-to-speech project.

@EtahReign
Copy link
Author

EtahReign commented Oct 24, 2024

Thank you both for the advice. How do I break the streamed response intosentences?

@WesleyFister
Copy link

The pseudocode is

from nltk.tokenize import sent_tokenize

def getSentences():
    tokens = streamed_response_from_LLM
    
    currentSentence = 1
    response = ""
    for token in tokens:
        response = response + token
        sentences = sent_tokenize(response)
        if len(sentences) > sentence:
            currentSentence += 1
            yield sentences[sentence - 1]

    yield sentences[sentence - 1] # Yield the final sentence

You would ideally run this in a separate thread and queue the sentences. In another thread use TTS to generate audio from each sentence and queue that. Finally, in yet another thread play each audio file.

@EtahReign
Copy link
Author

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants