A python program to convert CTM files (usually generated by Kaldi) into an EMU SDMS database.
This program is pretty self-contained and has very little requirements, with the exception of a transcriber.
These can be installed using pip or easy_install:
- tqdm - for progress bars
- pyphen - for syllabification
These have to be installed in some other way:
- R - required for signal processing (formant estimation)
- wrassp - speech processing library installable from within R
- phonetisaurus - for G2P
The project is available in the standard repo:
https://github.com/AdolfVonKleist/Phonetisaurus
We only need the C++ binaries, so no python bindings are necessary.
If you have a working Kaldi installation, you may simply run ./extras/install_phonetisaurus.sh
in the tools directory and use the binary there.
The model for Phonetisaurus for Polish G2P is included here (in the model.fst file).
usage: CTM_to_Emu.py [-h] [--wav-scp WAV_SCP] [--utt2ses UTT2SES] [-o]
[-n NAME] [-r RATE] [--rm-besi RM_BESI]
[--phonetisaurus PHONETISAURUS] [--g2p-model G2P_MODEL]
[--feat FEAT] [-s] [--segs SEGS] [--split SPLIT]
out_dir words_ctm phones_ctm [wav]
Program to convert CTM files (usually generated by Kaldi) into a folder
structure used by EMU-SDMS
positional arguments:
out_dir Output directory
words_ctm CTM containing words
phones_ctm CTM contatinng phonemes
wav Wave file corresponging to CTMs (only if single file,
use --wav-scp for multiple files)
optional arguments:
-h, --help show this help message and exit
--wav-scp WAV_SCP List of WAV files if CTM contains references to
multiple files. Uses Kaldi wav.scp format.
--utt2ses UTT2SES List of utterance to session mappings (similar to
utt2spk).
-o, --overwrite Overwrite output directory.
-n NAME, --name NAME Name of the database.
-r RATE, --rate RATE Samplerate of WAV file
--phonetisaurus PHONETISAURUS
Path to the phonetisaurus-g2pfst program.
--g2p-model G2P_MODEL
Path to the FST G2P model.
--feat FEAT Compute extra features (comma separated) using R
package "wrassp", e.g.: forest, ksvF0, mhsF0, rmsana,
zcrana
-s, --symlink Use symlinks instead of copying audio to database
--segs SEGS Use segments file