Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Support for language and/or task flags in Whisper models #1305

Open
deepseven opened this issue Jan 9, 2025 · 15 comments
Open
Labels
enhancement New feature or request

Comments

@deepseven
Copy link

I am a non-native English speaker, and I've encountered a significant limitation when using multilingual Whisper models (large v1/2/3). Even when I speak English, my accent causes the model to default to my mother tongue in the output. With large-v3 in particular, the language detection seems quite unpredictable, and the model switches between transcription and translation seemingly at random.

Would it be possible to implement support for the whisper.cpp flags --language and/or --task? This would allow users to explicitly specify the input language and/or type of task (transcription or translation). These flags are already available in the original Whisper implementation, and adding them would greatly improve the usability for non-native speakers like myself.

@LostRuins
Copy link
Owner

Hi, are you using Whisper via the API or via lite?

@LostRuins LostRuins added the enhancement New feature or request label Jan 9, 2025
@deepseven
Copy link
Author

I'm actually using lite to load the Whisper model, after which I use it as a server and make calls to its API.

To clarify my feature request: I'm suggesting that it would be helpful if we could specify these language and task parameters when initially loading the server (either via command line or webui), so these settings would then apply to all subsequent API calls.

While having the ability to specify these parameters in individual API calls would be ideal, I understand that would require much more development work.

@LostRuins
Copy link
Owner

Actually right now it has autodetect language, but you need to use a multilingual model such as https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small-q5_1.bin?download=true

In experimental, i will add the ability to change language. You can also set it via the API.

image

you'll be able to specify a language code (en, zh, de, fr, ja etc...) and it will use it.

@deepseven
Copy link
Author

Thank you for clarifying! Yes, I've been using the multilingual models (large, small, base) with auto-detection, but it hasn't been working reliably for my use case.

This is exactly what I need, and being able to set it via the API is even better than I hoped for. Thanks so much! 🙏 Looking forward to it!

@LostRuins
Copy link
Owner

It's added in the latest release v1.82!

@deepseven
Copy link
Author

Thank you for implementing this feature! I've noticed that while the language setting works correctly when set through the GUI for new chat inputs, it doesn't seem to be respected when making API calls at http://localhost:5001/v1/audio/transcriptions .

I've tried both setting it through the GUI and following the OpenAI API specification for the language parameter (i.e. this), but neither approach seems to work.

Could you clarify if this is intentional behavior or if I might be missing something in my implementation?

@LostRuins
Copy link
Owner

Woops, the field name i used was langcode instead of language. I will alias them so both work.

@deepseven
Copy link
Author

Hehe, no problem at all, I'll wait for the next release. Just FYI - I'm not sure if that should work just now but I tried using "langcode" as API call parameter and got the same behavior as before (the language setting was still not being applied).

@LostRuins
Copy link
Owner

Did you pass in the 2 letter language code? What code did you use?

@deepseven
Copy link
Author

I've tested it with 'en' as the language code, but it still detects my accent and translates my English speech into my native language. Here's how I'm making the request in my Python program:

        response = requests.post(
            TRANSCRIPTION_URL,
            headers=headers,
            files=files,
            data=data
        )

And here's the actual request/response data from my debug logs:

[DEBUG] Making POST request to URL: http://localhost:5001/v1/audio/transcriptions
[DEBUG] Headers: {'Authorization': 'Bearer your_api_key_here'}
[DEBUG] Files: dict_keys(['file'])
[DEBUG] Form data: {'model': 'whisper-1', 'response_format': 'json', 'temperature': 0.0, 'langcode': 'en'}
[DEBUG] Response status code: 200

@LostRuins
Copy link
Owner

Try this: there is a field called "prompt". Try adding a simple English prompt. E.g.

{"prompt":"This is a simple English transcript, which starts at the next sentence."}

@deepseven
Copy link
Author

No luck, still the same problem.

[DEBUG] Making POST request to URL: http://localhost:5001/v1/audio/transcriptions
[DEBUG] Headers: {'Authorization': 'Bearer your_api_key_here'}
[DEBUG] Files: dict_keys(['file'])
[DEBUG] Form data: {'model': 'whisper-1', 'response_format': 'json', 'temperature': 0.0, 'prompt': 'This is a simple English transcript, which starts at the next sentence.', 'langcode': 'en'}
[DEBUG] Response status code: 200

@LostRuins
Copy link
Owner

Ah, i see the problem. You're sending the data as form-data rather than a JSON payload. Currently that is not supported - i'll try to add it in future. For now, you should try using the JSON payload instead which takes base64 data.

https://lite.koboldai.net/koboldcpp_api#/api%2Fextra/post_api_extra_transcribe

@LostRuins
Copy link
Owner

Hi, this should be fixed now in 1.82.1

@deepseven
Copy link
Author

Thanks so much, that was exactly the problem. I can now confirm it works as intended with both form-data and JSON payload
with base64 encoding.

Thanks so much again for implementing this feature and for the quick responses!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants