Replies: 1 comment
-
have you figured out how to do it? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been using Whisper-large-v2 model from HuggingFace for local testing, and found setting 'return_timestamps=True' parameter in the ASR pipeline returns timestamped transcriptions (see below code snippet from HuggingFace page).
I would like to have access to these segment-level timestamps for an application I am working on, but it seems this parameter is not exposed in the Whisper Triton deployment here. Can anyone guide me on how I could set this parameter / access this output?
`import torch
from transformers import pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
prediction = pipe(sample.copy(), batch_size=8)["text"]
" Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."
prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
[{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
'timestamp': (0.0, 5.44)}]`
Beta Was this translation helpful? Give feedback.
All reactions