A toolkit for generating and translating SRT subtitles, built for German language learning.
Uses faster-whisper (CTranslate2) for speech-to-text with GPU acceleration. Includes waveform-based post-processing to fix Whisper's timestamp inaccuracies.
Features:
- Automatic speech-to-text transcription with word-level timestamps
- Smart subtitle formatting: max 42 chars/line, max 2 lines per entry
- Long segments are automatically split using word-level timing
- Waveform-based gap detection refines all subtitle boundaries by analyzing actual audio energy
- Handles corrupt audio packets gracefully (skips bad packets, keeps decoding)
- CUDA GPU acceleration with automatic CPU fallback
- Auto-scans
input/folder or accepts a specific file path
Waveform gap detection:
Whisper often misaligns subtitle timestamps by hundreds of milliseconds or even several seconds. The post-processing pipeline:
- Extracts raw audio using PyAV
- Computes RMS energy in 10ms frames across the full track
- Detects speech bursts (contiguous frames above a global silence threshold)
- Merges nearby bursts (< 300ms apart) to handle inter-word pauses
- Snaps each subtitle to the longest merged speech burst in its search region
- Enforces minimum gaps between consecutive subtitles and prevents overlaps
Usage:
# Process all files in input/
python generator.py
# Process a specific file
python generator.py video.mp4
# Custom model and language
python generator.py video.mp4 --model medium --language en
# Specify output path
python generator.py video.mp4 --output subtitles.srt
# Force CPU
python generator.py --device cpuOptions:
| Flag | Default | Description |
|---|---|---|
--model |
large-v3 |
Whisper model size (tiny, base, small, medium, large-v2, large-v3, turbo, distil-large-v3) |
--language |
de |
Language code (de, en, auto for auto-detect) |
--output |
output/<stem>.srt |
Output file path (single file mode only) |
--device |
auto |
auto, cuda, or cpu |
Translates SRT subtitle files from English to German using the Groq LLM API. Optimized for language learners — preserves slang, profanity, tech terms, and proper nouns.
Features:
- Text-only JSON batched translation (timestamps are never sent to the API)
- Handles profanity, slang, and technical vocabulary without sanitization
- Proper nouns and common tech terms stay in English
- Natural, spoken German tone in output
- Progress display with tqdm
Usage:
# Place English .srt files in input/
python translator.py
# Translated files appear in output/ with _DE.srt suffixRequires: A Groq API key in .env as GROQ_API_KEY.
- Python 3.10+
- NVIDIA GPU recommended for generator.py (CUDA)
- Dependencies:
pip install -r requirements.txt - For generator.py:
faster-whisper,PyAV,numpy,torch(installed separately)
git clone https://github.com/yourusername/subtitle-generator.git
cd subtitle-generator
python -m venv venv
# Windows PowerShell
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt
# For generator.py (GPU support)
pip install faster-whisper numpy av
pip install torch --index-url https://download.pytorch.org/whl/cu128input/ Video/audio files to process (gitignored)
output/ Generated/translated SRT files (gitignored)
generator.py Subtitle generator with waveform post-processing
translator.py SRT translator (English -> German)