Skip to content

Add inline repetition to compound specs#5

Open
kendonB wants to merge 1 commit intomasterfrom
feature/compoundrule-inline-repetition
Open

Add inline repetition to compound specs#5
kendonB wants to merge 1 commit intomasterfrom
feature/compoundrule-inline-repetition

Conversation

@kendonB
Copy link
Copy Markdown
Owner

@kendonB kendonB commented Mar 25, 2026

Summary

  • add inline repetition support for compound specs using open-ended brace ranges
  • propagate repeated extras through Compound, CompoundRule, MappingRule, and backend compilers
  • add parser, compiler, rule, and doctest coverage for the new repetition behavior

Testing

  • .venv/bin/python -m pytest dragonfly/test/test_lark_parser.py dragonfly/test/test_compound_rule_repetition.py dragonfly/test/test_compiler_natlink.py dragonfly/test/test_compiler_kaldi.py dragonfly/test/test_compiler_sapi5.py dragonfly/test/test_compiler_sphinx.py
  • note: the legacy natlink binary snapshot assertion in test_compiler_natlink.py still mismatches locally

Note

Medium Risk
Touches core parsing and repetition decoding plus multiple engine compilers, so regressions could affect recognition matching/greediness or engine-specific grammar generation; changes are well-covered by new targeted tests.

Overview
Adds inline repetition syntax to compound specs via brace ranges (e.g. "<item>{1,}", "{0,}", "{2,4}"), with parser updates to distinguish repetition quantifiers from existing {weight=...}-style specifiers and to avoid mutating shared extras when applying specials.

Extends Repetition with an unbounded mode (validated against empty-matchable children), updates its runtime decode()/get_repetitions() behavior, and propagates repeated extras as lists through Compound value functions and through CompoundRule/MappingRule via new shared extract_compound_extras() logic (including default/omission handling and per-branch shape inference).

Updates Kaldi/Natlink/SAPI5/Sphinx compilers to emit correct grammar structures for unbounded repetition (preserving min and optional skip behavior) and adds substantial unit/doctest coverage for the new parsing, repetition, and compiler semantics.

Written by Cursor Bugbot for commit 944e0b6. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 944e0b6afd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +596 to +598
if isinstance(element, Sequence):
return all(element_matches_empty(child, memo)
for child in element.children)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle unbounded Repetition in element_matches_empty

element_matches_empty() currently falls through to the generic Sequence branch for Repetition, which makes Repetition(..., min=0, unbounded=True) look non-empty whenever its child is non-empty. That lets nested grammars like Repetition(Repetition(Literal("a"), min=0, unbounded=True), min=0, unbounded=True) bypass the constructor guard, even though the inner repetition can match zero words; at runtime this can create non-terminating decode loops because the outer unbounded repetition can keep accepting zero-length child matches.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

if isinstance(element, Sequence):
return all(element_matches_empty(child, memo)
for child in element.children)
return False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty-check misses zero-min Repetition

High Severity

element_matches_empty treats Repetition as a plain Sequence, so unbounded repetitions with min=0 are considered non-empty. This bypasses the constructor guard and allows unbounded repetition children that can accept empty, causing non-consuming repetition loops.

Additional Locations (2)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant