Auto-generated — do not edit by hand. Run scripts/generate_step_docs.py to update.
Base class: TextStep
Apply multi-word phrase replacements (e.g. 'good bye' -> 'goodbye').
Reads operators.config.sentence_replacements. Applies longest match first so that more specific phrases take priority over shorter overlapping ones. No effect when the dict is empty.
Base class: TextStep
Lowercase all text using str.casefold().
Base class: TextStep
Convert >, <, = to language-specific words in numeric contexts using language config from operators.
Base class: TextStep
Convert remaining decimal periods to the language decimal word, defined in language config from operators.
'10.5' -> '10 point 5' (English). Avoids patterns already converted to 'dot' (IPs, versions).
Base class: TextStep
Convert °C and °F to language-specific words using language config from operators.
Base class: TextStep
Convert sequences of 3+ digit words to actual digits.
'two one three four' -> '2134', 'seven zero' stays (only 2 words). Delegates the word-to-digit mapping to operators.config.digit_words.
Base class: TextStep
Convert dots in domains, IPs, versions, file extensions to the language dot word, defined in language config in operator.
Base class: TextStep
Convert 'ten o'clock' -> '10:00'.
Reads operators.config.oclock_word and operators.config.time_words. Only processes time_words entries with numeric values 1-12. Values above 12 (minute expressions like "twenty", "thirty") are skipped because o'clock only applies to full hours. No operation when either field is None.
Base class: TextStep
Convert simple Roman numerals (II-IX) to Arabic digits in full text.
Runs before expand_alphanumeric_codes to prevent 'VIII' -> 'V I I I'. Only converts ii-ix to avoid false positives with single letters like 'I'. Skips 'v' when adjacent to digits (version-like contexts: v2, v 12).
Base class: TextStep
Convert word-based time patterns (two p.m -> 2 pm, two thirty p.m -> 2:30 pm).
Reads operators.config.time_words, operators.config.am_word, operators.config.pm_word, operators.config.oclock_word, and operators.get_compound_minutes(). No-op when required config is None.
Base class: TextStep
Space out uppercase words and alphanumeric codes.
'ABC123' -> 'A B C 1 2 3', 'CNN' -> 'C N N'. Skips pure numbers, ordinals (1st, 2nd), and protection markers. Must run before casefold_text.
Base class: TextStep
Expand contractions (it's -> it is, can't -> cannot).
Delegates to operators.expand_contractions().
Base class: TextStep
Convert written numbers to digits (fifty -> 50, twenty three -> 23). Delegates to operators.expand_written_numbers().
Base class: TextStep
Expand 'www' to 'W W W'.
Base class: TextStep
Collapse 'a m' / 'p m' into 'am' / 'pm' after time digits.
Reads operators.config.am_word and operators.config.pm_word. No-op when either is None.
Base class: TextStep
Convert number words back to digits when adjacent to 'dot' (IPs/versions).
'zero dot one dot two' -> '0 dot 1 dot 2'. Single-character entries (e.g. 'o') are excluded to avoid false positives in non-numeric contexts.
Base class: TextStep
Convert the word for 'one' to its digit when adjacent to other digits.
Example (English): '10 one one' -> '10 1 1', 'one 5' -> '1 5'
Base class: TextStep
Collapse space between 'v' and digit (v 2 -> v2). 'v' must be followed by a digit.
Base class: TextStep
Format '5 45 p m' -> '5:45 pm' and '545 pm' -> '5:45 pm'.
Reads operators.config.am_word and operators.config.pm_word. No-op when either is None.
Base class: TextStep
Normalize numeric time formats (05:45pm -> 5:45 pm, 5.45 p.m. -> 5:45 pm).
Delegates to operators.normalize_numeric_time_formats().
Base class: TextStep
Replace commas, dots, hyphens between number words with a single space.
Handles: 'seven, zero' -> 'seven zero', 'two-one-three' -> 'two one three'. Reads operators.config.number_words. No-op when None.
Base class: ProtectStep
Protect the decimal separator from being removed by RemoveSymbolsStep.
Base class: TextStep
Replace @ and . inside email addresses with placeholders.
A single email match requires protecting two different symbols (@ → EMAIL_AT, . → EMAIL_DOT) in one pass. ProtectStep handles exactly one placeholder per substitution.
Base class: TextStep
Mark single-letter-hyphen sequences to prevent false conversions.
Uses TextStep directly: the replacement is a per-match function that splits on '-' and suffixes each individual letter. Example: "b-o-b" → "bxltrx oxltrx bxltrx".
Base class: TextStep
Replace comma/dot+space between digits with ¤ marker. (1, 2, 3 -> 1 ¤ 2 ¤ 3)
Two independent patterns (comma-separated and dot-separated) both collapse to the same marker. ProtectStep handles a single pattern mapping to a single placeholder.
Base class: TextStep
Replace "+" when it appears right before a digit (like in phone numbers) with a special placeholder.
The regex pattern "+(?=\d)" means:
- Match a literal "+" character
- But only if it is immediately followed by a number (0-9)
- The digit is NOT included in the match (it stays untouched)
For example: "+123456" → the "+" is replaced, but "123456" stays the same.
Note: The pattern uses a lookahead (?=\d), which checks what comes next without capturing it as part of the match.
Base class: TextStep
Convert the plus word to XPLUSX before digit words (phone number context).
Reads operators.config.plus_word and operators.config.digit_words. No-op when plus_word is None or digit_words is empty/None.
Base class: TextStep
Mark sequences of 3+ single letters to prevent false conversions.
Uses TextStep directly: same reason as ProtectHyphenatedLetterSpellingStep — the replacement is a per-match function that suffixes each individual letter.
Base class: ProtectStep
Protect the colon used in time expressions like HH:MM. Matches times written with one or two digits for the hour and exactly two digits for the minutes (e.g., "9:30", "12:05"). The colon between them is temporarily replaced with a placeholder so it is not modified or removed by later text-processing steps.
Example: "9:30" → "9§30" (colon replaced with placeholder)
Base class: ProtectStep
Replace . in decimal unit expressions with ‡ placeholder (e.g. 9.8 m/s → 9‡8 m/s).
Base class: ProtectStep
Replace / in unit expressions with † placeholder (e.g. km/h → km†h).
Base class: TextStep
Remove periods from acronyms (U.S.A. -> USA, U.S. -> US).
Base class: TextStep
Normalize text by removing diacritics and converting special accented letters to their ASCII equivalents. (é -> e, ê -> e, etc.)
Base class: TextStep
Remove filler words defined in the language config (um, uh, euh, etc.).
Base class: TextStep
Remove # symbol before numbers (#1 -> 1).
Base class: TextStep
Strip xltrx suffix markers from letters.
Uses TextStep directly: the suffix is a token-embedded string (not a standalone token), so removal requires a regex word-boundary match. Example: "bxltrx oxltrx bxltrx" → "b o b"
Base class: TextStep
Remove dots that are not between digits (.X -> ' X', trailing .).
Base class: TextStep
Strip ¤ markers from the text.
Uses TextStep directly: the marker is deleted entirely (not restored to a character or word). RestoreStep replaces with a non-empty string; here the surrounding whitespace must also be collapsed to nothing.
Base class: TextStep
Collapse spaces between adjacent digits (1 2 3 -> 123).
Preserves spaces around 'point' (decimal word) and before ordinals. Handles ¤ markers by processing segments separately.
Base class: TextStep
Remove currency symbols that are not adjacent to numbers.
Base class: TextStep
Replace markers, symbols, and punctuation with spaces.
Preserves letters, digits, and all placeholder characters.
Base class: TextStep
Remove thousand separators based on the language config.
English uses comma (1,234 -> 1234), European languages use period (1.234 -> 1234).
Base class: TextStep
Remove space before apostrophe (' s -> 's).
Base class: TextStep
Remove trailing 'dot' after email-like words at end of text.
Base class: TextStep
Remove trailing period from text.
Base class: TextStep
Remove :00 from time expressions (10:00 pm -> 10 pm).
Reads operators.config.am_word and operators.config.pm_word. No-op when either is None.
Base class: TextStep
Replace currency symbols with their corresponding words.
Base class: RestoreStep
Restore XDECIMALX placeholder with the language-specific decimal word.
Base class: TextStep
Restore XATX placeholder with the language-specific 'at' word.
When no word is configured for '@', restores the original '@' character so that placeholders never leak into the final output.
Base class: TextStep
Restore XDOTX placeholder with the language-specific 'dot' word.
When no word is configured for '.', restores the original '.' character so that placeholders never leak into the final output.
Base class: TextStep
Restore XPLUSX placeholder back to + and collapse trailing space.
Beyond the plain replace, it must also collapse "+ " → "+" (e.g. after casefold splits the token). RestoreStep only does plain string replacement.
Base class: RestoreStep
Restore § placeholder back to colon.
Base class: RestoreStep
Restore ‡ placeholder with the language-specific decimal word.
Base class: RestoreStep
Restore † placeholder back to /.
Base class: WordStep
Apply single-word replacements from the language operators.
Skips email tokens. Uses a cached Replacer keyed on the language code.