-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Support for language and/or task flags in Whisper models #1305
Comments
Hi, are you using Whisper via the API or via lite? |
I'm actually using lite to load the Whisper model, after which I use it as a server and make calls to its API. To clarify my feature request: I'm suggesting that it would be helpful if we could specify these language and task parameters when initially loading the server (either via command line or webui), so these settings would then apply to all subsequent API calls. While having the ability to specify these parameters in individual API calls would be ideal, I understand that would require much more development work. |
Actually right now it has autodetect language, but you need to use a multilingual model such as https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small-q5_1.bin?download=true In experimental, i will add the ability to change language. You can also set it via the API. you'll be able to specify a language code (en, zh, de, fr, ja etc...) and it will use it. |
Thank you for clarifying! Yes, I've been using the multilingual models (large, small, base) with auto-detection, but it hasn't been working reliably for my use case. This is exactly what I need, and being able to set it via the API is even better than I hoped for. Thanks so much! 🙏 Looking forward to it! |
It's added in the latest release v1.82! |
Thank you for implementing this feature! I've noticed that while the language setting works correctly when set through the GUI for new chat inputs, it doesn't seem to be respected when making API calls at http://localhost:5001/v1/audio/transcriptions . I've tried both setting it through the GUI and following the OpenAI API specification for the language parameter (i.e. this), but neither approach seems to work. Could you clarify if this is intentional behavior or if I might be missing something in my implementation? |
Woops, the field name i used was |
Hehe, no problem at all, I'll wait for the next release. Just FYI - I'm not sure if that should work just now but I tried using "langcode" as API call parameter and got the same behavior as before (the language setting was still not being applied). |
Did you pass in the 2 letter language code? What code did you use? |
I've tested it with 'en' as the language code, but it still detects my accent and translates my English speech into my native language. Here's how I'm making the request in my Python program:
And here's the actual request/response data from my debug logs:
|
Try this: there is a field called "prompt". Try adding a simple English prompt. E.g. {"prompt":"This is a simple English transcript, which starts at the next sentence."} |
No luck, still the same problem.
|
Ah, i see the problem. You're sending the data as form-data rather than a JSON payload. Currently that is not supported - i'll try to add it in future. For now, you should try using the JSON payload instead which takes base64 data. https://lite.koboldai.net/koboldcpp_api#/api%2Fextra/post_api_extra_transcribe |
Hi, this should be fixed now in 1.82.1 |
Thanks so much, that was exactly the problem. I can now confirm it works as intended with both form-data and JSON payload Thanks so much again for implementing this feature and for the quick responses! |
I am a non-native English speaker, and I've encountered a significant limitation when using multilingual Whisper models (large v1/2/3). Even when I speak English, my accent causes the model to default to my mother tongue in the output. With large-v3 in particular, the language detection seems quite unpredictable, and the model switches between transcription and translation seemingly at random.
Would it be possible to implement support for the whisper.cpp flags --language and/or --task? This would allow users to explicitly specify the input language and/or type of task (transcription or translation). These flags are already available in the original Whisper implementation, and adding them would greatly improve the usability for non-native speakers like myself.
The text was updated successfully, but these errors were encountered: