Skip to content

A scalable Python module for robust audio transcription using OpenAI's Whisper model. Supports multiple languages, batch processing, and output formats like JSON and SRT.

Notifications You must be signed in to change notification settings

Arslanex/Whisper-Transcriber

Repository files navigation

🎙️ Whisper Transcription Module

🌟 Overview

A powerful, flexible Python module for audio transcription leveraging OpenAI's Whisper model, designed to transform audio content into accurate, multilingual text.

✨ Key Features

  • 🔊 Advanced Audio Transcription

    • Utilizes state-of-the-art Whisper AI technology
    • Supports multiple languages and dialects
  • 🌐 Multilingual Support

    • Transcribe and translate audio across 99 languages
    • Automatic language detection
  • 📄 Flexible Output Formats

    • TXT, JSON, SRT, VTT
    • Customizable transcription settings
  • 📂 Versatile Processing

    • Single file and batch processing
    • Configurable model sizes
    • GPU and CPU support

📚 Documentation

🇺🇸 English 🇹🇷 Türkçe
Installation Guide Installation Guide
CLI Usage Guide Komut Satırı Kullanım Kılavuzu
Module Usage Guide Modül Kullanım Kılavuzu
Feature Specifications Özellik Spesifikasyonları

🚀 Demo Scripts

The demo_scripts directory offers comprehensive scenarios demonstrating the module's capabilities:

Scenario Description Key Features
1: Basic Transcription Simple audio transcription Default 'base' model, quick processing
2: Multilingual Translation Translate audio to English Multi-language support, configurable logging
3: Batch Processing Process multiple audio files Directory-wide transcription, format flexibility
4: Advanced Configuration Detailed transcription control Quality filtering, segment management
5: Error Handling Robust error management Fallback strategies, comprehensive logging
6: Advanced Batch Processing Large-scale transcription Parallel processing, detailed reporting

📋 System Requirements

💻 Computational Resources

  • Python: 3.8+
  • CPU: All models supported
  • GPU: Optional acceleration
    • Use --device cuda for GPU transcription
    • Automatic CPU fallback

📦 Dependencies

  • openai-whisper
  • torch
  • numpy
  • soundfile
  • ffmpeg-python

🤝 Contributing

  1. Fork the repository
  2. Create a virtual environment
  3. Install development dependencies: pip install -e .[dev]
  4. Run tests: pytest
  5. Submit a pull request

🐛 Support

📄 License

MIT License - see the LICENSE file for details.

🙏 Acknowledgements

  • OpenAI for the Whisper model
  • Python open-source community

About

A scalable Python module for robust audio transcription using OpenAI's Whisper model. Supports multiple languages, batch processing, and output formats like JSON and SRT.

Topics

Resources

Stars

Watchers

Forks