[feature request] Support for language and/or task flags in Whisper models #1305

deepseven · 2025-01-09T09:07:30Z

I am a non-native English speaker, and I've encountered a significant limitation when using multilingual Whisper models (large v1/2/3). Even when I speak English, my accent causes the model to default to my mother tongue in the output. With large-v3 in particular, the language detection seems quite unpredictable, and the model switches between transcription and translation seemingly at random.

Would it be possible to implement support for the whisper.cpp flags --language and/or --task? This would allow users to explicitly specify the input language and/or type of task (transcription or translation). These flags are already available in the original Whisper implementation, and adding them would greatly improve the usability for non-native speakers like myself.

LostRuins · 2025-01-09T10:15:41Z

Hi, are you using Whisper via the API or via lite?

deepseven · 2025-01-09T12:20:25Z

I'm actually using lite to load the Whisper model, after which I use it as a server and make calls to its API.

To clarify my feature request: I'm suggesting that it would be helpful if we could specify these language and task parameters when initially loading the server (either via command line or webui), so these settings would then apply to all subsequent API calls.

While having the ability to specify these parameters in individual API calls would be ideal, I understand that would require much more development work.

LostRuins · 2025-01-09T15:32:02Z

Actually right now it has autodetect language, but you need to use a multilingual model such as https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small-q5_1.bin?download=true

In experimental, i will add the ability to change language. You can also set it via the API.

you'll be able to specify a language code (en, zh, de, fr, ja etc...) and it will use it.

deepseven · 2025-01-09T19:25:58Z

Thank you for clarifying! Yes, I've been using the multilingual models (large, small, base) with auto-detection, but it hasn't been working reliably for my use case.

This is exactly what I need, and being able to set it via the API is even better than I hoped for. Thanks so much! 🙏 Looking forward to it!

LostRuins · 2025-01-18T08:46:35Z

It's added in the latest release v1.82!

deepseven · 2025-01-18T15:20:06Z

Thank you for implementing this feature! I've noticed that while the language setting works correctly when set through the GUI for new chat inputs, it doesn't seem to be respected when making API calls at http://localhost:5001/v1/audio/transcriptions .

I've tried both setting it through the GUI and following the OpenAI API specification for the language parameter (i.e. this), but neither approach seems to work.

Could you clarify if this is intentional behavior or if I might be missing something in my implementation?

LostRuins · 2025-01-18T15:24:07Z

Woops, the field name i used was langcode instead of language. I will alias them so both work.

deepseven · 2025-01-18T16:05:36Z

Hehe, no problem at all, I'll wait for the next release. Just FYI - I'm not sure if that should work just now but I tried using "langcode" as API call parameter and got the same behavior as before (the language setting was still not being applied).

LostRuins · 2025-01-18T16:07:24Z

Did you pass in the 2 letter language code? What code did you use?

deepseven · 2025-01-18T16:23:29Z

I've tested it with 'en' as the language code, but it still detects my accent and translates my English speech into my native language. Here's how I'm making the request in my Python program:

        response = requests.post(
            TRANSCRIPTION_URL,
            headers=headers,
            files=files,
            data=data
        )

And here's the actual request/response data from my debug logs:

[DEBUG] Making POST request to URL: http://localhost:5001/v1/audio/transcriptions
[DEBUG] Headers: {'Authorization': 'Bearer your_api_key_here'}
[DEBUG] Files: dict_keys(['file'])
[DEBUG] Form data: {'model': 'whisper-1', 'response_format': 'json', 'temperature': 0.0, 'langcode': 'en'}
[DEBUG] Response status code: 200

LostRuins · 2025-01-19T05:01:03Z

Try this: there is a field called "prompt". Try adding a simple English prompt. E.g.

{"prompt":"This is a simple English transcript, which starts at the next sentence."}

deepseven · 2025-01-19T06:08:55Z

No luck, still the same problem.

[DEBUG] Making POST request to URL: http://localhost:5001/v1/audio/transcriptions
[DEBUG] Headers: {'Authorization': 'Bearer your_api_key_here'}
[DEBUG] Files: dict_keys(['file'])
[DEBUG] Form data: {'model': 'whisper-1', 'response_format': 'json', 'temperature': 0.0, 'prompt': 'This is a simple English transcript, which starts at the next sentence.', 'langcode': 'en'}
[DEBUG] Response status code: 200

LostRuins · 2025-01-19T08:10:16Z

Ah, i see the problem. You're sending the data as form-data rather than a JSON payload. Currently that is not supported - i'll try to add it in future. For now, you should try using the JSON payload instead which takes base64 data.

https://lite.koboldai.net/koboldcpp_api#/api%2Fextra/post_api_extra_transcribe

LostRuins · 2025-01-20T12:44:59Z

Hi, this should be fixed now in 1.82.1

deepseven · 2025-01-20T18:46:15Z

Thanks so much, that was exactly the problem. I can now confirm it works as intended with both form-data and JSON payload
with base64 encoding.

Thanks so much again for implementing this feature and for the quick responses!

LostRuins added the enhancement New feature or request label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Support for language and/or task flags in Whisper models #1305

[feature request] Support for language and/or task flags in Whisper models #1305

deepseven commented Jan 9, 2025

LostRuins commented Jan 9, 2025

deepseven commented Jan 9, 2025

LostRuins commented Jan 9, 2025

deepseven commented Jan 9, 2025

LostRuins commented Jan 18, 2025

deepseven commented Jan 18, 2025

LostRuins commented Jan 18, 2025

deepseven commented Jan 18, 2025

LostRuins commented Jan 18, 2025

deepseven commented Jan 18, 2025

LostRuins commented Jan 19, 2025

deepseven commented Jan 19, 2025

LostRuins commented Jan 19, 2025

LostRuins commented Jan 20, 2025

deepseven commented Jan 20, 2025

[feature request] Support for language and/or task flags in Whisper models #1305

[feature request] Support for language and/or task flags in Whisper models #1305

Comments

deepseven commented Jan 9, 2025

LostRuins commented Jan 9, 2025

deepseven commented Jan 9, 2025

LostRuins commented Jan 9, 2025

deepseven commented Jan 9, 2025

LostRuins commented Jan 18, 2025

deepseven commented Jan 18, 2025

LostRuins commented Jan 18, 2025

deepseven commented Jan 18, 2025

LostRuins commented Jan 18, 2025

deepseven commented Jan 18, 2025

LostRuins commented Jan 19, 2025

deepseven commented Jan 19, 2025

LostRuins commented Jan 19, 2025

LostRuins commented Jan 20, 2025

deepseven commented Jan 20, 2025