Skip to content

SENCHEYSUON/Simple_Khmer_AutoSpeechRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Khmer ASR Project

This repository contains a simple Khmer Automatic Speech Recognition (ASR) project from scratch. Feel free to fork this repository, submit pull requests, or send us suggestions on what should be improved! I just do it for fun and celebrating my bd :D

1. Data Collection and Preprocessing

Data Collection

  • YouTube Videos

    • We crawl YouTube videos using yt_dlp and ffmpeg.
    • Instructions: Just follow the links to download the tools and refer to the given channel names for crawling.
    • yt_dlp: Download yt_dlp
    • ffmpeg: Download ffmpeg
  • OpenSLR Dataset

Data Cleaning

  • Background Noise Removal

    • We use Ultimate Vocal Remover for background noise removal.
    • Ultimate Vocal Remover
    • Separate code and model's files are provided in the folder Background_Noise.
  • Chunking

    • Automated chunking is performed using Python. For non-stratified results, manual checking with Audacity is recommended.
    • Download Audacity

2. Transcription

  • Transcribe_New.py
    • The script outputs three folders: 1_word, UNK (unknown), and non-transcript.
    • Manual checking is recommended for perfect transcription accuracy.

3. Data Training

Wav2Vec2

  • About Wav2Vec2
    • Wav2Vec2 is a state-of-the-art model developed by Facebook AI (now Meta AI) for ASR. It converts raw audio waveforms into meaningful text.
    • Wav2Vec2 Model
    • The model was trained using connectionist temporal classification (CTC), so the output has to be decoded using Wav2Vec2CTCTokenizer.

Metrics

  • WER (Word Error Rate)
    • WER is a metric used to evaluate the quality of transcriptions produced by ASR systems.
    • Word Error Rate
    • In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript.

4. Simple StreamLit Application

  • This application records voice or inputs a file and returns the transcript text!

References

Problems

About

Just celebrate my birthday 🤷‍♂️

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published