-
Notifications
You must be signed in to change notification settings - Fork 243
Conversation
Enter `<c-u>` to enter transcription mode.
* Prefix Algorithm used * Cursor moved to end of line * Audio interacted with via callbacks * Transcriber moved to it's own file
Error I get when trying Fixed by installing |
How accurate are the transcriptions for you guys? Might just be the Whisper model we're using, but the transcriptions aren't very accurate for me |
On another note its pretty cool to be talking to my terminal haha |
@waydegg I've found them pretty accurate. I have a high quality mic though and I'm trying to talk clearly. I think ideally we'd run the medium or large model but with my amd graphics card I've had trouble getting the model to actually run on the GPU so only the tiny model feels responsive enough. Model size could definitely be in the config (though that leads to another problem that the And yeah I'm aware of the hoops. There's a different hoop for every os. I was thinking we could have pyaudio as an optional requirement. GPT tells me we can put this in setup:
And then users can install voice I'm going to look into using sounddevice first though. I think it might be a better library than pyaudio in a few ways and there documentation suggests you don't need anything special to install it. Can you confirm Another question I had for you: pyright is complaing faster_whisper has no stub files. Are you okay globally ignoring it in the pyright config? One final question: Can you replace tiny with Large-v2 and report the performance on your m3? I'm curious. |
Yeah it might just be my mic (using the built-in mic). If I raise my voice the transcriptions get better lol.
Yup!
Nice.
I can install it without any extras (uninstalled
Yup will try now |
So I can run the Large-V3 model, but when we get the results in mentat I'm only getting the first few words of whatever it is I'm saying |
Huh, that's weird. Maybe it's the model being slow? If you wait a bit you don't get more of the transcript? |
If I wait for the model to process each word and then I say the next one it works haha |
Word level algorithm for doubling back.
Alright, in my opinion this is ready for review now. The following things would be nice to have but I don't think necessary for merge or the most valuable thing to work on. I'll make an issue for what we merge without.
|
b822810
to
541e719
Compare
I'm getting pretty poor transcriptions on the default Feels like this will be a really nice feature to have! Could we display the audio device being used? |
I'll save the recordings to logs. That should make it easier to debug. |
After discussion yesterday I decided to make two major changes:
|
mentat/session_input.py
Outdated
@@ -12,8 +12,10 @@ | |||
async def _get_input_request(**kwargs: Any) -> StreamMessage: | |||
session_context = SESSION_CONTEXT.get() | |||
stream = session_context.stream | |||
default_prompt = session_context.conversation.default_prompt | |||
session_context.conversation.default_prompt = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of a hacky way to pass the information to the front end. I like it passing as the data field but I don't like temporarily storing it in conversation and accessing it here. One alternative would be to have commands return an Optional[str]
and if they do pass it into _get_input_request
here. Btw reading this code I wasn't entirely sure why commands were intercepted there instead of one level up here. It's sort of the same thing either way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually do somewhat separate how commands are handled in my agent PR; I made a separate function specifically for intercepting commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking something completely different when it comes to passing up the input from whisper; I was thinking we could just send the input on a completely different channel (so we wouldn't have to touch input_request at all), and the client could have another task listening on that channel that it would use to just add whatever comes in on that channel to prompt_toolkit's buffer. What do you think about that idea? I think it would be cleaner than this and more adaptable for other use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different channel and the client storing it makes sense to me. We can't simply add it to the buffer though because it doesn't exist while the command is running. It's made when the input request signal is sent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a new stream default_prompt
. I can imagine using it for other things though all the ideas I've come up with so far seem sort of contrived. (Maybe if the user runs /commit
with no argument gpt could write the commit message and then the user could see /commit $WHAT_GPT_WROTE
and have a chance to edit before actually commiting?)
@@ -57,6 +57,11 @@ async def _cprint_session_stream(self): | |||
async for message in self.session.stream.listen(): | |||
print_stream_message(message) | |||
|
|||
async def _default_prompt_stream(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot more; thanks!
Just tested it out for the first time and holy cow it's really good and cool! I know this had already taken a while to merge in, but I had one more idea; would it be difficult to stream openai's output? I think it could be pretty quick to add and would really level up the experience for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks great to me! After looking into it, it turns out streaming isn't actually supported by whisper.
Enter
<c-u>
to enter transcription mode. Feels kind of cool actually. It's sort of far from being mergeable though:brew install portaudio
before runningpip install mentat
. One option could be making it an optional dependency usingextras_require
and try/catching the pyaudio import.* [ ] Cost tracking for whisperNo longer using api