odtp-pyannote-whisper

title	sdk	pinned
odtp-pyannote-whisper	docker	false

odtp-pyannote-whisper

Note

This repository makes use of submodules. Therefore, when cloning it you need to include them.

git clone --recurse-submodules https://github.com/sdsc-ordes/odtp-pyannote-whisper

This pipeline processes a .wav or mp4 media file by detecting the number of speakers present in the recording using pyannote.audio. For each detected speaker segment, it employs OpenAI's Whisper model to transcribe or translate the speech individually. This approach ensures accurate and speaker-specific transcriptions or translations, providing a clear understanding of who said what throughout the audio.

Note: This application utilizes pyannote.audio and OpenAI's Whisper model. You must accept the terms of use on Hugging Face for the pyannote/segmentation and pyannote/speaker-diarization models before using this application.

After accepting these terms and conditions for those models. You can obtain you HuggingFace API Key to allow the access to these models:

Hugging Face Access Keys

This token should be provided to the component via the ENV variables or by the corresponding text field in the web app interface (Here).

Tools Information

Tool	Version	Commit Hash	Documentation
OpenAI Whisper	Latest	Commit History	Whisper Documentation
pyannote.audio	Latest	Commit History	pyannote.audio Documentation

How to add this component to your ODTP instance

This component can be run directly with Docker, however it is designed to be run with ODTP. In order to add this component to your ODTP CLI, you can use. If you want to use the component directly, please refer to the docker section.

odtp new odtp-component-entry \
--name odtp-pyannote-whisper \
--component-version v0.1.1 \
--repository https://github.com/sdsc-ordes/odtp-pyannote-whisper

Data sheet

Parameters

Parameter	Description	Type	Required	Default Value	Possible Values	Constraints
`MODEL`	Whisper model to use for transcription or translation	String	Yes	`large-v3`	`tiny`, `base`, `small`, `medium`, `large`, `large-v2`, `large-v3`	Must be a valid Whisper model name
`TASK`	Task to perform on the audio input	String	Yes	`transcribe`	`transcribe`, `translate`	Must be `transcribe` or `translate`
`LANGUAGE`	Source language code for the audio input	String	No	`auto`	`auto`, `en`, `es`, `fr`, `de`, `it`, `pt`, `nl`, `ja`, `zh`, `ru`	Must be a supported language code
`INPUT_FILE`	Path to the input `.wav` audio file	String	Yes	N/A	Any valid file path to a `.wav` file	File must exist and be accessible
`OUTPUT_FILE`	Base name for the output files (without extension)	String	Yes	`output`	Any valid file name	Should not contain invalid characters

Secrets

Secret Name	Description	Type	Required	Default Value	Constraints	Notes
HF_TOKEN	Hugging Face API token for model access	String	Yes	None	Valid API Token	Obtain from your Hugging Face account settings

Input Files

File/Folder	Description	File Type	Required	Format	Notes
`INPUT_FILE`	Input audio file for processing	`.wav`	Yes	WAV format	Path specified by `INPUT_FILE` parameter

Output Files

File/Folder	Description	File Type	Contents	Usage
`OUTPUT_FILE.srt`	Transcribed subtitles in SRT format	`.srt`	Transcribed text with timings	Use with video players to display subtitles
`OUTPUT_FILE.json`	Transcription data in JSON format	`.json`	Detailed transcription data	For programmatic access and data analysis

Tutorial

How to run this component as docker

Build the dockerfile.

docker build -t odtp-pyannote-whisper .

Then create .env file similar to .env.dist and fill the variables values. Like on this example:

MODEL=base
HF_TOKEN=hf_xxxxxxxxxxx
TASK=transcribe
INPUT_FILE=HRC_20220328T0000.mp4
OUTPUT_FILE=HRC_20220328T0000
VERBOSE=TRUE

Then create 3 folders:

odtp-input, where your input data should be located.
odtp-output, where your output data will be stored.
odtp-logs, where the logs will be shared.

After this, you can run the following command and the pipeline will execute.

docker run -it --rm \
-v {PATH_TO_YOUR_INPUT_VOLUME}:/odtp/odtp-input \
-v {PATH_TO_YOUR_OUTPUT_VOLUME}:/odtp/odtp-output \
-v {PATH_TO_YOUR_LOGS_VOLUME}:/odtp/odtp-logs \
--env-file .env \
odtp-pyannote-whisper

Development Mode

To run the component in development mode, mount the app folder inside the container:

docker run -it --rm \
-v {PATH_TO_YOUR_INPUT_VOLUME}:/odtp/odtp-input \
-v {PATH_TO_YOUR_OUTPUT_VOLUME}:/odtp/odtp-output \
-v {PATH_TO_YOUR_LOGS_VOLUME}:/odtp/odtp-logs \
-v {PATH_TO_YOUR_APP_FOLDER}:/odtp/app \
--env-file .env odtp-pyannote-whisper

Running with GPU

To run the component with GPU support, use the following command:

docker run -it --rm \
--gpus all \
-v {PATH_TO_YOUR_INPUT_VOLUME}:/odtp/odtp-input \
-v {PATH_TO_YOUR_OUTPUT_VOLUME}:/odtp/odtp-output \
-v {PATH_TO_YOUR_LOGS_VOLUME}:/odtp/odtp-logs \
--env-file .env odtp-pyannote-whisper

On Windowss this is the command to execute.

docker run -it --rm `
--gpus all `
-v ${PWD}/odtp-input:/odtp/odtp-input `
-v ${PWD}/odtp-output:/odtp/odtp-output `
-v ${PWD}/odtp-logs:/odtp/odtp-logs `
--env-file .env odtp-pyannote-whisper

Running in API Mode

To run the component in API mode and expose a port, you need to use the following environment variables:

ODTP_API_MODE=TRUE
ODTP_GRADIO_SHARE=FALSE #Only if you want to share the app via the gradio tunneling

After the configuration, you can run:

docker run -it --rm \
-p 7860:7860 \
--env-file .env \
odtp-pyannote-whisper

And access to the web interface on localhost:7860 in your browser.

Credits and references

This component has been created using the odtp-component-template v0.5.0.

The development of this repository has been realized by SDSC.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
app		app
assets		assets
odtp-component-client @ 3ae17a4		odtp-component-client @ 3ae17a4
.env.dist		.env.dist
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
odtp.yml		odtp.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

odtp-pyannote-whisper

Table of Contents

Tools Information

How to add this component to your ODTP instance

Data sheet

Parameters

Secrets

Input Files

Output Files

Tutorial

How to run this component as docker

Development Mode

Running with GPU

Running in API Mode

Credits and references

About

Releases 2

Packages

Languages

License

sdsc-ordes/odtp-pyannote-whisper

Folders and files

Latest commit

History

Repository files navigation

odtp-pyannote-whisper

Table of Contents

Tools Information

How to add this component to your ODTP instance

Data sheet

Parameters

Secrets

Input Files

Output Files

Tutorial

How to run this component as docker

Development Mode

Running with GPU

Running in API Mode

Credits and references

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages