-
Notifications
You must be signed in to change notification settings - Fork 550
Fix escaped hash in regex extended modefixed escaped hash in regex extended mode #4814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.1.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Fixes handling of escaped hash (\#) when stripping comments in PCRE extended (x) mode so that \# is treated as a literal # rather than the start of a comment.
Changes:
- Update the freespacing comment-stripping logic in
RegexGroupParserto account for escaped hashes. - Add an nsrt test case covering escaped hash behavior in extended mode.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/Type/Regex/RegexGroupParser.php |
Adjusts the regex used to strip #... comments in /x mode. |
tests/PHPStan/Analyser/nsrt/preg_match_shapes.php |
Adds a regression test to validate escaped-hash handling in extended mode parsing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/Type/Regex/RegexGroupParser.php
Outdated
| // in freespacing mode the # character starts a comment and runs until the end of the line | ||
| $regex = preg_replace('/(?<!\?)#.*/', '', $regex) ?? ''; | ||
| // but \# is an escaped literal hash, not a comment | ||
| $regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? ''; |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new freespacing comment-stripping pattern will also treat the # in PCRE inline comment groups (?# ... ) as a line comment (because it’s not preceded by \). This is a regression from the previous (?<!\?) guard and can break valid patterns (e.g. tests/PHPStan/Analyser/nsrt/bug-11311.php contains (?# ...) inside an /x regex). Consider updating the stripping logic to exclude # that is part of (?#...) (and ideally avoid using a single lookbehind heuristic altogether).
| $regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? ''; | |
| // also, (?# ... ) is an inline comment group and its # must not be treated as a line comment | |
| $regex = preg_replace('/(?<!\\\)(?<!\(\?)#.*/', '', $regex) ?? ''; |
src/Type/Regex/RegexGroupParser.php
Outdated
| // in freespacing mode the # character starts a comment and runs until the end of the line | ||
| $regex = preg_replace('/(?<!\?)#.*/', '', $regex) ?? ''; | ||
| // but \# is an escaped literal hash, not a comment | ||
| $regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? ''; |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/(?<!\\\)#.*/ only checks the single character immediately before #. In extended mode, \\# (two backslashes before #) should still start a comment because the last backslash is escaped, but this regex will incorrectly preserve it. To match PCRE semantics you need to treat # as escaped only when it’s preceded by an odd-length run of backslashes (and also ensure you’re not inside a character class).
| $regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? ''; | |
| // and # inside a character class does not start a comment | |
| $length = strlen($regex); | |
| $processedRegex = ''; | |
| $inCharClass = false; | |
| $backslashRun = 0; | |
| for ($i = 0; $i < $length; $i++) { | |
| $char = $regex[$i]; | |
| $escaped = ($backslashRun % 2) === 1; | |
| // update character class state (unescaped [ and ]) | |
| if ($char === '[' && !$escaped) { | |
| $inCharClass = true; | |
| } elseif ($char === ']' && !$escaped) { | |
| $inCharClass = false; | |
| } | |
| // start of a comment: unescaped # outside of character class | |
| if ($char === '#' && !$escaped && !$inCharClass) { | |
| // skip until end of line or end of pattern | |
| while ($i + 1 < $length) { | |
| $nextChar = $regex[$i + 1]; | |
| if ($nextChar === "\n" || $nextChar === "\r") { | |
| break; | |
| } | |
| $i++; | |
| } | |
| $backslashRun = 0; | |
| continue; | |
| } | |
| $processedRegex .= $char; | |
| if ($char === '\\') { | |
| $backslashRun++; | |
| } else { | |
| $backslashRun = 0; | |
| } | |
| } | |
| $regex = $processedRegex; |
src/Type/Regex/RegexGroupParser.php
Outdated
| // in freespacing mode the # character starts a comment and runs until the end of the line | ||
| $regex = preg_replace('/(?<!\?)#.*/', '', $regex) ?? ''; | ||
| // but \# is an escaped literal hash, not a comment | ||
| $regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? ''; |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The literal /(?<!\\\)#.*/ relies on PHP single-quote escaping rules (the trailing \ is not an escape sequence), which is hard to read/maintain. Consider rewriting it in a clearer form (e.g. using \\\\ in the PHP string for a \\ regex escape) or adding a short comment explaining the escaping, to reduce the chance of accidental changes breaking the pattern.
| $regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? ''; | |
| // (?<!\\\\) is a negative lookbehind for a backslash; '\\\\' in PHP becomes '\\' in the regex | |
| $regex = preg_replace('/(?<!\\\\)#.*/', '', $regex) ?? ''; |
\# was incorrectly treated as comment start in /x mode
|
/cc @staabm Please review 😊 |
staabm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of such a regex feature, but implementation wise it looks good to me.
thank you
When using the
x(extended) modifier,\#was incorrectly treated as the start of a comment. According to PCRE specification,\#is an escaped literal hash character and should not start a comment.Example
Before this fix, PHPStan would strip
\#.]) $/xas a comment, breaking the pattern parsing.Changes
src/Type/Regex/RegexGroupParser.php: Changed the comment-stripping regex from/(?<!\?)#.*/to/(?<!\\)#.*/to respect escaped hashestestExtendedSyntaxEscapedHashintests/PHPStan/Analyser/nsrt/preg_match_shapes.phpNote
This fix does not address
#inside character classes like[#.](which is also not a comment in PCRE). However, users can use the workaround[\#.]which now works correctly with this fix.