A Linux-based voice dictation system using OpenAI's Whisper model for speech-to-text conversion and automatic text input.
This system provides real-time voice dictation capabilities by:
- Recording audio from your system's microphone
- Converting speech to text using Whisper
- Automatically typing the recognized text using ydotool
- Linux system with systemd
- Python 3.x
- Root access for installation
- microphone
- ydotool installed
Either:
curl -sSL https://raw.githubusercontent.com/nilock/dictate/main/remote_install.sh | bash
Or:
- Clone this repository:
git clone https://github.com/nilock/dictate
cd dictate
- Run the installation script as root:
sudo ./installation.sh
This will:
- Create a Python virtual environment in
/opt/dictation_venv
- Install required Python packages
- Set up the dictation service
- Start the service automatically
- Start/stop dictation using the provided script:
dictation.sh
It is useful to set up a system hot-key to point to this script at its installation destination: /usr/local/bin/dictation.sh
-
When activated:
- Speak
- Run the command again to stop recording
- The recognized text will be automatically typed at your cursor position
-
Check service status:
systemctl status dictation
- View logs:
journalctl -u dictation -f
dictation_daemon.py
: Background service handling audio recording and transcriptiondictation_client.py
: Client interface for sending commands to the daemondictation.sh
: Convenient shell script wrapperinstallation.sh
: System setup and service installation
The system uses Whisper's "base" model by default. You can modify dictation_daemon.py
to use different models:
- Check service status:
systemctl status dictation
- Review logs:
tail -f /tmp/dictation_daemon.log
- Test audio recording:
python3 dictation_script_only.py
The system saves the last recording as /tmp/last_recording.wav
for debugging purposes.
A standalone testing script (dictation_script_only.py
) is provided for development and testing purposes.
The following is useful to redeploy locally:
sudo bash ./installation.sh && sudo journalctl -u dictation -f
- openai-whisper
- sounddevice
- numpy
- torch
- scipy
- pynput
GPL-3.0
Sure, go nuts.