An API Wrapper for Whisperx Library
This is a FastAPI application that provides an endpoint for video/audio transcription using the whisperx
command. The application supports multiple audio and video formats. It performs the transcription, alignment, and diarization of the uploaded media files.
- User Authentication with JWT
- Support for multiple audio and video formats
- Diarization support
- Customizable language and model settings
- whisperx
- Python 3.8+
- FastAPI
- ffmpeg
- SQLite
- pyjwt
- dotenv
Follow the instructions on how to install Whisperx in the official repository
You can install these dependencies using the requirements.txt
file:
pip install -r requirements.txt
Create a .env
file in your root directory and add the following variables:
SECRET_KEY=your_secret_key
MASTER_KEY=your_master_key
HUGGING_FACE_TOKEN=your_hugging_face_token
API_PORT=11300
SQLite is used for storing user information. The database is created automatically when the application runs.
Run the application using:
python api_whisperx.py
Replace main
with the name of your Python file if it's not main.py
.
Authenticate a user and return a JWT token.
username
: The username of the user.password
: The password of the user.
Create a new user.
username
: Desired username.password
: Desired password.master_key
: Master key for authorized user creation.
Transcribe an uploaded audio or video file.
file
: The audio or video file to transcribe.lang
: Language for transcription (default is "pt").model
: Model to use for transcription (default is "large-v2").min_speakers
: Minimum number of speakers for diarization (default is 1).max_speakers
: Maximum number of speakers for diarization (default is 2).
The application has built-in logging that informs about the steps being performed and any errors that occur.