Translate_Subs

Batch-translate subtitles for your entire media library using LLMs that understand context, idioms, and slang — not just word-for-word replacement.

Point it at a folder — movies, TV series, documentaries, entire libraries — and it handles everything. Detects existing subtitles (embedded or external), translates them into your language, embeds them into your MKVs, and cleans up what you don't need. All in one pass.

The result is subtitles that read like they were written by a native speaker, at a cost of roughly 1 cent per episode.

Features

Natural translations — LLMs understand context, tone, and intent. Jokes land, slang makes sense, dialogue flows naturally.
Fully configurable — target language, source priority list, which languages to keep, which LLM to use. All in one config file.
Hands-off batch processing — point at a folder and walk away. Already translated files are skipped. Safe to re-run any time.
Fast — streaming pipeline starts translating as soon as the first file is found. Up to 8 files translated in parallel.
Network-friendly — directory caching eliminates thousands of slow lookups over VPN/network shares.
Flexible source detection — finds subtitles in external files (.srt, .ass, including .sdh/.hi/.forced variants), embedded MKV tracks, and even untagged tracks (identified via LLM).
Smart fallback — if no preferred-language subtitle is available, falls back to any language it can find. The LLM handles the rest.
One-pass remux — translating, embedding, and cleaning happen in a single ffmpeg call. Half the I/O of doing them separately.
Self-healing translations — detects when the LLM stops mid-batch and automatically retries only the failed portion.
Multiple LLM providers — DeepSeek, OpenAI, Groq, Mistral, OpenRouter, Ollama, LM Studio. Any OpenAI-compatible API works.
Cross-platform — PowerShell wrappers for Windows, Bash wrappers for Linux. Same Python core on both.

How It Works

Video file
  │
  ▼
[1] Skip check ─── already has target language? skip
  │
  ▼
[2] Source detection ─── find subtitles in preferred languages
  │                      (external .srt/.ass first, then embedded)
  ▼
[3] Fallback ─── if nothing preferred, use ANY available language
  │
  ▼
[4] LLM Translation ─── batch translate via DeepSeek/OpenAI/etc.
  │                     with self-healing retry on failures
  ▼
[5] Reflow ─── ensure max 2 lines per subtitle cue
  │
  ▼
[6] MKV remux ─── embed translated subs + keep wanted tracks +
  │               drop unwanted tracks, all in ONE pass
  ▼
Finished MKV with only your wanted languages

What it does:

Scans a folder recursively for video files (MKV, MP4, AVI, MOV, WebM, OGM)
Skips any video that already has the target language (embedded or external)
Extracts and translates the best available source subtitles
Embeds the translated output back into MKV files as a native subtitle track
Strips any unwanted languages to keep your files clean
Never touches the original video or audio streams

Translation and scanning overlap — the first file starts translating while the rest are still being scanned.

Cost

Scale	Approximate cost
1 episode (45 min)	~$0.01
1 season (10 episodes)	~$0.10
1 full series (50 episodes)	~$0.50
100 episodes	~$1
1,000 episodes	~$10

Note

These estimates are based on DeepSeek Chat pricing as of April 2026 (the default and recommended provider). A cup of coffee pays for ~500 episodes. A Netflix monthly subscription covers ~1,500. Other providers (OpenAI, Anthropic, etc.) can cost 10-50x more for similar quality. Always check your provider's current pricing before processing large libraries.

The --dry-run flag lets you preview exactly what would be processed without making any API calls or modifying any files.

Installation

Requirements (both platforms)

Component	Required	Notes
Python 3.11+	Yes	Installed by the script if missing
Git	Yes	Installed by the script if missing
FFmpeg + FFprobe	Yes	Installed by the script if missing
LLM API key	Yes	DeepSeek recommended (tested extensively). Other OpenAI-compatible providers also work

Windows Install

Prerequisites

Windows 10 or 11
PowerShell 5.1+ (built into Windows)
Python 3.11+ (the installer can install it for you)

Step 1: Run the installer

Open PowerShell, navigate to where you want to install, and run the one-liner:

irm https://raw.githubusercontent.com/dexusno/Translate_Subs/main/install.ps1 | iex

This creates a Translate_Subs folder in your current working directory. cd to wherever you want it installed first.

The installer will:

Check for Python 3.11+, git, ffmpeg — offer to install via winget if missing
Clone the repository
Install Python dependencies (requests, python-dotenv)
Create .env and llm_config.json from templates

Step 2: Get your LLM API key

DeepSeek is recommended (see Configuration below), but any OpenAI-compatible provider works.

Step 3: Run it

cd Translate_Subs
.\translate_subs.ps1 "C:\Movies\Some Movie"

Linux Install

Prerequisites

Debian 13, Ubuntu 22.04+, or similar
sudo access (for installing system packages)

Step 1: Run the installer

Open a terminal, navigate to where you want to install, and run the one-liner:

curl -fsSL https://raw.githubusercontent.com/dexusno/Translate_Subs/main/linux/install.sh | bash

This creates a Translate_Subs folder in your current working directory. cd to wherever you want it installed first.

The installer will:

apt-get install system dependencies (python3, python3-venv, git, ffmpeg, mkvtoolnix)
Clone the repository
Create a Python venv at .venv/ inside the project
Install all Python dependencies
Create .env and llm_config.json from templates
Mark all shell scripts as executable

Step 2: Get your LLM API key

DeepSeek is recommended (see Configuration below), but any OpenAI-compatible provider works.

Step 3: Run it

cd Translate_Subs
./linux/translate_subs.sh "/media/tv/Some Show"

The wrapper runs the venv Python directly — no manual activation needed.

Step 4 (optional): Set up the interactive folder picker

linux/pick.sh is a convenience wrapper around fzf — instead of typing full paths, you get a fuzzy-searchable list of your TV shows, movies, etc., pick one with arrow keys, then pick an action (translate, clean, mux, force-retranslate, dry-run variants). For TV series it also lets you pick a specific season or all seasons.

One-liner to install fzf and create the config file:

curl -fsSL https://raw.githubusercontent.com/dexusno/Translate_Subs/main/linux/install-pick.sh | bash

Open media_roots.conf in your editor of choice and add one media library path per line (e.g. /mnt/media/Tv, /mnt/media/Movies). Then run the picker:

./linux/pick.sh

media_roots.conf is gitignored, which means it's preserved when you update the project later via git pull or the included update.sh script — your paths won't be overwritten. You can also pass a path directly: ./linux/pick.sh /mnt/media/Tv.

Updating

On either platform, to pull the latest version while keeping your local config:

./linux/update.sh        # Linux
.\update.ps1             # Windows

This stashes any local changes, pulls the latest commits, then restores your modifications.

Configuration

All configuration lives in llm_config.json in the project directory. On first install, it's created automatically from llm_config.example.json.

Important

llm_config.json is gitignored — your settings survive updates. Never edit llm_config.example.json for your personal config.

Full file structure at a glance:

llm_config.json
├── default_profile         Which LLM provider to use (e.g. "deepseek")
├── remove_bitmap_subs      Remove PGS/DVD bitmap subtitle tracks (true/false)
├── target_language
│   ├── name                Language name for the LLM prompt
│   ├── codes               ISO codes that mean "already translated, skip"
│   ├── sidecar_code        Output filename code (Movie.XX.srt)
│   ├── mkv_tag             Language tag when embedding into MKV
│   └── keep_with           Other languages to keep alongside the target
├── source_languages        Ordered list of languages to translate FROM
└── profiles                LLM provider configs (API URL, model, key, tuning)

Each section is explained below.

Target Language

The target_language block defines what you're translating to and which ISO codes represent it:

"target_language": {
  "name": "Norwegian",
  "codes": ["no", "nor", "nob", "nb", "nno"],
  "sidecar_code": "no",
  "mkv_tag": "nor",
  "keep_with": ["en", "eng", "da", "dan", "sv", "swe"]
}

Field	What it does
`name`	The language name sent to the LLM in the translation prompt.
`codes`	All ISO codes that represent this language. If a file already has subtitles tagged with any of these codes, it's considered done and skipped.
`sidecar_code`	The code used in output filenames: `Movie.{code}.srt`
`mkv_tag`	The language tag applied when embedding translated subs into an MKV.
`keep_with`	Languages to keep alongside your target (see below).

Examples for other target languages

French (keep English alongside):

"target_language": {
  "name": "French",
  "codes": ["fr", "fra", "fre"],
  "sidecar_code": "fr",
  "mkv_tag": "fra",
  "keep_with": ["en", "eng"]
}

German (keep English alongside):

"target_language": {
  "name": "German",
  "codes": ["de", "deu", "ger"],
  "sidecar_code": "de",
  "mkv_tag": "deu",
  "keep_with": ["en", "eng"]
}

Brazilian Portuguese (keep English and Spanish):

"target_language": {
  "name": "Brazilian Portuguese",
  "codes": ["pt", "por"],
  "sidecar_code": "pt",
  "mkv_tag": "por",
  "keep_with": ["en", "eng", "es", "spa"]
}

Keeping Other Languages

keep_with controls which additional languages are allowed to remain in your MKV files. Your target language is always kept — you don't need to list it here.

This setting affects two things:

Embedded subtitle tracks — tracks tagged with a keep_with language stay in the MKV. Tracks in any other language are removed.
External subtitle files — if an external .srt or .ass file exists for a keep_with language and it isn't already embedded, it gets embedded automatically. After processing, all recognized external subtitle files are cleaned up.

After processing, each MKV will contain only your target language and the languages listed in keep_with. Everything else is stripped out.

Source Languages

The source_languages list controls which subtitle tracks are preferred as a translation source. Languages are tried in priority order — the first match wins:

"source_languages": [
  {"codes": ["en", "eng"], "name": "English"},
  {"codes": ["da", "dan"], "name": "Danish"},
  {"codes": ["sv", "swe"], "name": "Swedish"}
]

Add, remove, or reorder languages to match your library. If none of these are found, the script falls back to any available subtitle in any language.

LLM Profiles

Choose your translation backend with --profile. Each profile is defined in llm_config.json:

Profile	Provider	Model	Notes
`deepseek`	DeepSeek	deepseek-chat	Recommended — excellent quality, very low cost
`openai`	OpenAI	gpt-4o	High quality, significantly higher cost
`groq`	Groq	llama-3.3-70b	Free tier available
`mistral`	Mistral	mistral-large	Good for European languages
`openrouter`	OpenRouter	deepseek/deepseek-chat	Access to many models via one API
`local`	Ollama / LM Studio	(loaded model)	Free, runs locally

We recommend DeepSeek as the default. It produces natural, context-aware translations at a fraction of the cost of other cloud APIs. See DeepSeek pricing for current rates.

We have no affiliation with DeepSeek and receive no benefit from recommending them — it's simply what works best for this use case.

Adding a custom provider

Any OpenAI-compatible API works:

"my-provider": {
  "api_url": "https://api.example.com/v1/chat/completions",
  "model": "model-name",
  "api_key_env": "MY_PROVIDER_API_KEY"
}

Then add MY_PROVIDER_API_KEY=your-key to .env. For local models that don't need a key, use "api_key": "none" instead.

Per-Profile Tuning

Each profile can include performance settings. Cloud APIs handle larger batches efficiently, while local models benefit from smaller batches and longer timeouts:

"deepseek": {
  "batch_size": 200,
  "parallel": 8
},
"local": {
  "batch_size": 25,
  "parallel": 1,
  "timeout": 600
}

Setting	What it does	Cloud default	Local default
`batch_size`	Subtitle groups per API call	200	25
`parallel`	Files translated concurrently	8	1
`timeout`	Seconds before an API call times out	120	600

CLI flags (--batch-size, --parallel) override profile settings when specified.

Note

The cloud batch_size of 200 is tuned for DeepSeek's 8K output token limit. Raising it further may cause mid-batch truncation (the script detects and retries this, but it's slower). Lowering it is always safe.

Bitmap Subtitle Removal (PGS)

Bitmap-based subtitle tracks (PGS, DVD subs) can optionally be removed during cleaning. These formats are incompatible with many players and workflows, and text-based subtitles (SRT) are generally preferred.

"remove_bitmap_subs": true

When enabled, PGS and DVD subtitle tracks are removed regardless of language, and are ignored when checking for existing target language subs. If a file has target-language PGS subs but also has text-based subs in another language, the script will translate the text subs instead.

Default: false (PGS tracks are kept). Set to true if you want them removed.

Usage

Interactive folder picker (Linux)

Browse your media library and pick a folder without typing paths:

sudo apt install fzf          # one-time setup
./linux/pick.sh                # pick from your configured media roots
./linux/pick.sh /mnt/media/Tv  # or pick from a specific folder

Type a few letters to filter, arrow keys to select, then choose an action (translate, clean, mux, or dry-run variants). Media folder paths are stored in media_roots.conf (gitignored — survives git pull).

Windows

# Basic — translate everything in a folder
.\translate_subs.ps1 "C:\Movies\Inception (2010)"

# Entire TV series (all seasons)
.\translate_subs.ps1 "C:\TvSeries\Breaking Bad"

# Preview — see what would be translated
.\translate_subs.ps1 "C:\Media" -DryRun

# Different LLM provider
.\translate_subs.ps1 "C:\Movies" -Profile openai

# Limit files, retranslate existing, keep external files
.\translate_subs.ps1 "C:\Movies" -Limit 5 -Force -KeepSidecar

# Network share (UNC paths supported)
.\translate_subs.ps1 "\\nas\media\Movies"

Linux

# Basic — translate everything in a folder
./linux/translate_subs.sh "/media/movies/Inception (2010)"

# Entire TV series (all seasons)
./linux/translate_subs.sh "/media/tv/Breaking Bad"

# Preview — see what would be translated
./linux/translate_subs.sh "/media" --dry-run

# Different LLM provider
./linux/translate_subs.sh "/media/movies" --profile openai

# Limit files, retranslate existing, keep external files
./linux/translate_subs.sh "/media/movies" --limit 5 --force --keep-sidecar

# Network share (mounted via SMB/NFS)
./linux/translate_subs.sh "/mnt/nas/movies"

CLI Options

Windows flag	Linux flag	Description	Default
`folder`	`folder`	Path to scan for video files	Required
`-Profile`	`--profile`	LLM profile from llm_config.json	`deepseek`
`-BatchSize`	`--batch-size`	Subtitle groups per LLM API call	200
`-Parallel`	`--parallel`	Concurrent file processing	8
`-Limit`	`--limit`	Max number of files to process	unlimited
`-Force`	`--force`	Retranslate even if target exists	off
`-DryRun`	`--dry-run`	Preview without making changes	off
`-KeepSidecar`	`--keep-sidecar`	Keep external `.srt` after muxing	off
`-NoMux`	`--no-mux`	Leave external `.srt` alongside the file; don't touch the MKV	off
`-SkipClean`	`--skip-clean`	Don't strip unwanted tracks	off
`-SkipDetect`	`--skip-detect`	Don't detect untagged subtitle languages	off
`-LogFile`	`--log-file`	Also write log output to this file	none

Standalone tools

The main translate_subs script handles everything automatically — translating, embedding into MKV, and cleaning unwanted tracks in one pass. You don't need to run the tools below separately under normal use.

They're available as standalone scripts if you want to run just one step on its own:

Embed external subtitles into MKVs

Useful if you already have .srt files from another source and just want to embed them:

.\mux_subs.ps1 "C:\TvSeries\Show"                 # Windows

./linux/mux_subs.sh "/media/tv/Show"               # Linux

Clean unwanted subtitle tracks

Useful if you just want to strip unwanted languages without translating:

.\clean_subs.ps1 "C:\Movies" -DryRun               # Windows

./linux/clean_subs.sh "/media/movies" --dry-run    # Linux

Scripts

PowerShell	Linux	Purpose
`translate_subs.ps1` / `.py`	`linux/translate_subs.sh`	Translate, embed, and clean in one pass
`mux_subs.ps1` / `.py`	`linux/mux_subs.sh`	Embed external subtitle files into MKV containers
`clean_subs.ps1` / `.py`	`linux/clean_subs.sh`	Remove unwanted subtitle tracks from MKVs
`start-llama-server.ps1`	`linux/start-llama-server.sh`	Start llama.cpp server for local translation
`install.ps1`	`linux/install.sh`	Install dependencies and configure the project
`update.ps1`	`linux/update.sh`	Update to the latest version, preserving local changes
—	`linux/pick.sh`	Interactive folder picker (requires `fzf`)

Supported File Formats

The script works with any common video format — MKV is not required.

Format	Translation	Embedding	Track cleanup
MKV	Full support — translates from embedded or external subtitles	Translated subs + wanted external files are embedded directly into the MKV	Unwanted subtitle tracks are removed, external files cleaned up
MP4, AVI, MOV, WebM, OGM	Full support — translates from external `.srt`/`.ass` files	Not supported (these formats don't allow easy subtitle embedding without re-encoding)	Not applicable

For non-MKV files, the translated subtitles are saved as a .srt file next to the video (e.g. Movie.no.srt). Existing external subtitle files are left untouched. Most players — Plex, Jellyfin, VLC, Kodi — pick up external .srt files automatically.

The scripts work with any folder layout. Point them at any level and they scan recursively:

Movies/
  Inception (2010)/
    Inception (2010).mkv

TvSeries/
  Breaking Bad/
    Season 01/
      Breaking.Bad.S01E01.mkv

Media/              # point here to process everything
  Movies/
  TvSeries/
  Documentaries/

Under the Hood

Click to expand — explains the engineering decisions behind the pipeline

Streaming pipeline

Early versions did a two-phase approach: scan the entire folder tree, collect all jobs, then translate. This worked fine on local disks but was painfully slow over VPN/network paths — scanning a season folder could take minutes before any translation started.

The current implementation uses a producer-consumer pipeline. The scanner is a generator that yields jobs one at a time as they're found. Translation workers pull from the pipeline and start immediately. Scanning and translating overlap completely.

def _generate_jobs(folder) -> Generator[Job, None, None]:
    for media in folder.rglob("*"):
        job = analyze(media)
        if job:
            yield job   # translator starts working immediately

with ThreadPoolExecutor(max_workers=8) as pool:
    for job in _generate_jobs(folder):
        pool.submit(_translate_one, job)

The first translation typically starts within 1-2 seconds of launching the script, even on a 500-episode library.

Directory cache

Checking if a file already has a .no.srt sidecar means calling Path.exists() — which is a network round-trip over SMB. With multiple language code variants (.no.srt, .nor.srt, .nob.srt, .nb.srt) and the .sdh/.hi/.forced suffixes to consider, every video file could trigger 20+ network round-trips.

Instead, we do one rglob("*") pass at startup to build a set of every path in the tree. All subsequent existence checks become O(1) in operations with zero network traffic.

class DirCache:
    def __init__(self, root: Path):
        self._files = set()
        for p in root.rglob("*"):
            self._files.add(p)

    def exists(self, path: Path) -> bool:
        return path in self._files

On a 500-file library over VPN, this cut the scan phase from several minutes to under 2 seconds.

One-pass mux + clean

Older versions ran two separate ffmpeg operations per file:

Mux pass: embed the translated .srt into the MKV
Clean pass: remux again to remove unwanted subtitle tracks

Each remux reads and writes the entire MKV file. On a 3 GB file over a 100 Mbps connection, that's 4-5 minutes of pure I/O — doubled for the two passes.

The current implementation merges both operations into a single ffmpeg call using explicit stream mapping. The command:

Maps all video + audio streams (-map 0:v -map 0:a)
Maps only the wanted embedded subtitle tracks by index
Adds the new translated .srt as a second input (-i new.srt -map 1:0)
Sets language metadata on the new track
Also maps any external wanted-language .srt files (.en.srt, .da.srt, etc.) that aren't already embedded

One read, one write, done. Half the I/O of the old approach — which matters enormously over the network.

Self-healing translation retry

Cloud LLMs occasionally fail partway through a batch. Before our detection logic, this meant episodes where the translation would suddenly switch back to English halfway through. The failure was silent — nothing in the API response indicated a problem.

The script now detects this in two ways:

finish_reason check — logs a warning when DeepSeek returns finish_reason=length (output truncated due to token limit)
Consecutive unchanged lines — if 5+ lines in a row come back identical to the source, the LLM stopped translating

When detected, the script identifies exactly where translation stopped, keeps the good lines, and resends only the failed tail as a smaller request:

Batch 0-200: translation stopped at line 96, resending last 104 lines...
Batch 0-200: recovered 104/104 failed lines

This is a significant improvement over retrying the entire batch — the failed portion is smaller, more likely to succeed, and the already-translated lines don't waste API tokens.

Output token limit (max_tokens=8192)

DeepSeek's deepseek-chat model has a default max output of 4096 tokens, but a maximum of 8192. If you don't explicitly set max_tokens, you get the 4K default — which causes silent truncation around line 190 of a subtitle batch.

This took a long time to find because the failures looked like "the LLM just stopped translating" — there was no error, no warning, just partial output. We now explicitly set max_tokens=8192 in every API call.

The default batch_size of 200 is carefully tuned to stay within this: at roughly 20-25 tokens per subtitle line output (including the [N] marker), 200 lines produces ~4000-5000 tokens, leaving headroom for unusually long lines.

Sidecar discovery with flag suffixes

Subtitle files come in many naming conventions:

Episode.en.srt           # standard
Episode.en.sdh.srt       # SDH (Subtitles for Deaf and Hard-of-hearing)
Episode.en.hi.srt        # Hearing Impaired
Episode.en.forced.srt    # Forced narrative
Episode.en.cc.srt        # Closed Captions

Early versions only looked for the base pattern ({stem}.{code}.{ext}) and silently ignored the flag variants — meaning thousands of SDH files were invisible to the script and didn't get translated or cleaned up.

The current find_sidecar() iterates through all known flag suffixes:

_SIDECAR_FLAG_SUFFIXES = ("", ".sdh", ".hi", ".cc", ".forced")

for ext in (".srt", ".ass"):
    for code in sorted(lang_codes):
        for flag in _SIDECAR_FLAG_SUFFIXES:
            candidate = parent / f"{stem}.{code}{flag}{ext}"
            if cache.exists(candidate):
                return candidate

The language code extractor strips the flag to get the actual language: Episode.en.sdh.srt → en.

Fallback translation from any language

The source_languages list in config defines preferred translation sources (usually English, Danish, Swedish, etc.). But what about episodes that only have Romanian or Polish subtitles?

If nothing in the priority list is found, the script falls back to any available subtitle in any language. The LLM can translate from virtually any language — it doesn't need to be in the source list. Fallback usage is logged as [FALLBACK] for easy review.

This works for:

Embedded tracks with any language tag
Untagged embedded tracks (language identified via LLM sample)
External subtitle files with unknown language codes
External files without any language code at all (Movie.srt)

Why multi-line subtitle reflow

Some source subtitles have 3 or more lines per cue. Most subtitle standards (and most video players) expect a maximum of 2 lines — 3+ lines can overlap with on-screen text or be uncomfortable to read.

After translation, the script checks each subtitle cue. If it has 3+ text lines, it merges them into a single string and splits back into 2 lines at natural word boundaries using a proportional algorithm:

def _split_to_n_lines_preserving_words(text, n):
    # Find cut points at word boundaries
    # Prefer balanced line lengths
    # Never break in the middle of a word

All text is preserved — only the line break positions change. No text is added or removed.

Thread-safe parallel translation

With 8 concurrent translation workers, race conditions were a real risk. The script uses:

Atomic file writes — translated output is written to a temp file, then renamed (rename is atomic on both Linux and Windows NTFS)
Lock for stats — a threading.Lock guards the shared stats dict during parallel updates
Per-file isolation — each worker operates on one file end-to-end (translate → mux → clean) before moving to the next. No shared state between files.

The deferred cleaning phase runs sequentially after all translations complete, to avoid contention on files that don't need translation but might need cleaning.

Why the deterministic system prompt

We tested multiple prompt variants with different temperatures:

v1 prompt + temperature 0.3 (current) — deterministic, firm, numbered rules. Produces excellent context-aware translations.
v2 prompt + temperature 1.3 (DeepSeek's official recommendation for translation) — more varied phrasings. Testing showed the higher temperature produced more colloquial output but lost context awareness. For example, it would translate "See ya, fellas!" as "Ser dere, folkens" (inclusive) when the context made "gutter" (specifically men) correct.

The v1 prompt produces slightly less flashy output but is more consistently correct. For subtitles where context matters more than variety, determinism wins.

Notably, the partial-failure rate is the same with both prompts — it's driven by DeepSeek server-side flakiness, not prompt quality. The retry mechanism is what actually solves failures.

Troubleshooting

"ffmpeg not found" Install ffmpeg and ensure both ffmpeg and ffprobe are on your PATH.

Windows: winget install ffmpeg then restart your terminal
Linux: sudo apt-get install ffmpeg

"Python packages not found" (Linux) The Linux scripts use a virtual environment at .venv/. Re-run ./linux/install.sh or manually install:

.venv/bin/pip install requests python-dotenv

API timeout If translations time out on large files, reduce --batch-size (default 200) or increase the timeout in your profile config.

Partially translated episodes If subtitles switch from your target language back to the source mid-episode, the batch size may be too large for your LLM provider's output token limit. DeepSeek deepseek-chat has an 8K output token limit — the default batch_size of 200 is tuned to stay within this. If you see finish_reason=length warnings, lower the batch size further. The script automatically detects and retries failed portions, but smaller batches prevent the issue entirely.

Safe to re-run Already translated files are skipped. Partially translated files (interrupted mid-write) are retranslated. You can stop and resume at any time.

Disclaimer

This software is provided as-is, without warranty of any kind. By using Translate_Subs, you acknowledge the following:

File modification — this tool modifies media files in place (remuxing MKV containers, deleting external subtitle files). While it uses atomic file operations and creates backups during remuxing, data loss is always possible. Back up your media library before running on important files.
Translation quality — translations are generated by third-party LLM APIs or local models. Output quality depends on the model, the source material, and the language pair. Always spot-check translations before relying on them.
API costs — cloud LLM providers charge per token. While costs are low (~$0.01 per episode), processing a very large library will accumulate charges. Use --dry-run to preview what will be processed before committing.
Third-party services — this tool sends subtitle text (not video) to external APIs (DeepSeek, OpenAI, etc.) for translation. Do not use it on content you are not authorized to share with these services.
Legal responsibility — you are solely responsible for ensuring your use of this tool complies with applicable laws, including copyright and content licensing. The authors of this project are not responsible for how it is used.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
linux		linux
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
clean_subs.ps1		clean_subs.ps1
clean_subs.py		clean_subs.py
install.ps1		install.ps1
llm_config.example.json		llm_config.example.json
media_roots.conf.example		media_roots.conf.example
mux_subs.ps1		mux_subs.ps1
mux_subs.py		mux_subs.py
requirements.txt		requirements.txt
start-llama-server.ps1		start-llama-server.ps1
sync-folder.ps1		sync-folder.ps1
test_deepseek.py		test_deepseek.py
translate_subs.ps1		translate_subs.ps1
translate_subs.py		translate_subs.py
update.ps1		update.ps1

Folders and files

Latest commit

History

Repository files navigation

Translate_Subs

Table of Contents

Features

How It Works

Cost

Installation

Requirements (both platforms)

Windows Install

Prerequisites

Step 1: Run the installer

Step 2: Get your LLM API key

Step 3: Run it

Linux Install

Prerequisites

Step 1: Run the installer

Step 2: Get your LLM API key

Step 3: Run it

Step 4 (optional): Set up the interactive folder picker

Updating

Configuration

Target Language

Keeping Other Languages

Source Languages

LLM Profiles

Per-Profile Tuning

Bitmap Subtitle Removal (PGS)

Usage

Interactive folder picker (Linux)

Windows

Linux

CLI Options

Standalone tools

Scripts

Supported File Formats

Under the Hood

Streaming pipeline

Directory cache

One-pass mux + clean

Self-healing translation retry

Output token limit (max_tokens=8192)

Sidecar discovery with flag suffixes

Fallback translation from any language

Why multi-line subtitle reflow

Thread-safe parallel translation

Why the deterministic system prompt

Troubleshooting

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages