Using fine tuned whisper model with whisper-timestamped #69

blueskyleaf · 2023-04-01T17:00:17Z

blueskyleaf
Apr 1, 2023

Hello!
I am confused or inexperienced.
I want to use the https://huggingface.co/NbAiLab/whisper-large-v2-nob fine tuned model, on Norwegian language, with whisper-timestamped.
But I am doing something wrong.

In the colab, I have this code:

!pip3 install git+https://github.com/linto-ai/whisper-timestamped
!pip3 install matplotlib
!pip install transformers
!pip install torch

import whisper_timestamped

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("NbAiLab/whisper-large-v2-nob")
model = AutoModelForSpeechSeq2Seq.from_pretrained("NbAiLab/whisper-large-v2-nob")

import whisper_timestamped as whisper
import json

audio = whisper.load_audio("audio.mp3")
model = whisper.load_model(UNSURE_WHAT_TO_PUT_HERE, device="cuda")
result = whisper.transcribe(model, audio)

print(json.dumps(result, indent = 2, ensure_ascii = False))

I am not sure what to put in the model = whisper.load_model part, as highlighted by the capitalized text.

Ideas?

Answered by Jeronymous

Apr 3, 2023

Oh, that's a really relevant suggestion: adding support for models finetuned by HuggingFace's transformers or SpeechBrain.

I've just pushed something, adding things in whisper_timestamped.load_model.

Now you should be able to just do:

import whisper_timestamped as whisper

audio = whisper.load_audio("audio.mp3")
model = whisper.load_model("NbAiLab/whisper-large-v2-nob", device="cuda")
result = whisper.transcribe(model, audio)

Note: for now, the load_model will work with HuggingFace repo and local folders as soon as they include the model size in their names ("tiny", "small", ..., "large-v2").

View full answer

Jeronymous · 2023-04-03T12:23:18Z

Jeronymous
Apr 3, 2023
Maintainer

Oh, that's a really relevant suggestion: adding support for models finetuned by HuggingFace's transformers or SpeechBrain.

I've just pushed something, adding things in whisper_timestamped.load_model.

Now you should be able to just do:

import whisper_timestamped as whisper

audio = whisper.load_audio("audio.mp3")
model = whisper.load_model("NbAiLab/whisper-large-v2-nob", device="cuda")
result = whisper.transcribe(model, audio)

Note: for now, the load_model will work with HuggingFace repo and local folders as soon as they include the model size in their names ("tiny", "small", ..., "large-v2").

3 replies

blueskyleaf Apr 3, 2023
Author

Thank you! That is awesome!

Jeronymous Apr 5, 2023
Maintainer

You are very welcome.

FYI, there were some issues that were fixed yesterday: #70
Do not hesitate to update whisper-timestamped if you continue using models from HuggingFace or SpeechBrain.

blueskyleaf Apr 5, 2023
Author

Thank you!! You fixed and added the features I wanted, very quickly, Very exciting.
I must say, what ever you did to lower the VRAM load, really worked. It went from 13GB VRAM down to just 7GB VRAM load after that new update. The CPU RAM also went down. So nice. And now I can run local huggingface models too, so good.

But, as you said, the repeating sentences is not because of RAM overload, because even well below the limits, I still get a fair amount of repeated sentences. Sometimes as much as 60 repeated sentences in a row. The audio I am using is 1h30m long. And in perhaps 5 parts of it this happens. I can share my colab and the audio, or the transcript it made, with you if you want.
I have tried faster-whisper too, with the same huggingface model I have run the same audio several times, and in one of the runs (when I tried with VAD on) I had a few repeated sentences, but not as much. And in the other runs, no repetitions. So I tried with VAD off with whisper-timestamped, but yeah, still a lot of repetitions. It is unfortunate, because then I can't use the transcription, especially since all the later timestamps after the repetitions are scewed. I'm sorry if I am writing all about this here in the comment, I can put it in an issue if you want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using fine tuned whisper model with whisper-timestamped #69

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using fine tuned whisper model with whisper-timestamped #69

blueskyleaf Apr 1, 2023

Replies: 1 comment · 3 replies

Jeronymous Apr 3, 2023 Maintainer

blueskyleaf Apr 3, 2023 Author

Jeronymous Apr 5, 2023 Maintainer

blueskyleaf Apr 5, 2023 Author

blueskyleaf
Apr 1, 2023

Replies: 1 comment 3 replies

Jeronymous
Apr 3, 2023
Maintainer

blueskyleaf Apr 3, 2023
Author

Jeronymous Apr 5, 2023
Maintainer

blueskyleaf Apr 5, 2023
Author