This repo contains simple to use, pretrained/training-less models for speaker diarization.
-
Binary Key Speaker Modeling
Based on pyBK by Jose Patino which implements the diarization system from "The EURECOM submission to the first DIHARD Challenge" by Patino, Jose and Delgado, Héctor and Evans, Nicholas
If you have any other models you would like to see added, please open an issue.
This library seeks to provide a very basic interface. To use the Binary Key model on a file, do something like this:
import numpy as np
from pydiar.models import BinaryKeyDiarizationModel, Segment
from pydiar.util.misc import optimize_segments
from pydub import AudioSegment
INPUT_FILE = "test.wav"
sample_rate = 32000
audio = AudioSegment.from_wav(INPUT_FILE)
audio = audio.set_frame_rate(sample_rate)
audio = audio.set_channels(1)
diarization_model = BinaryKeyDiarizationModel()
segments = diarization_model.diarize(
sample_rate, np.array(audio.get_array_of_samples())
)
optimized_segments = optimize_segments(segments)
Now optimized_segments
contains a list of segments with their start, length and speaker id
A simple script which reads an audio file, diarizes it and transcribes it into the WebVTT format can be found in examples/generate_webvtt.py
.
To use it, download a vosk model from https://alphacephei.com/vosk/models and then run the script using
poetry install
poetry run python -m examples.generate_webvtt -i PATH/TO/INPUT.wav -m PATH/TO/VOSK_MODEL