Skip to content

Conversation

@dg
Copy link
Contributor

@dg dg commented Jan 24, 2026

When using the x (extended) modifier, \# was incorrectly treated as the start of a comment. According to PCRE specification, \# is an escaped literal hash character and should not start a comment.

Example

preg_match('/^ ([\#.]) $/x', $s, $matches);

Before this fix, PHPStan would strip \#.]) $/x as a comment, breaking the pattern parsing.

Changes

  • src/Type/Regex/RegexGroupParser.php: Changed the comment-stripping regex from /(?<!\?)#.*/ to /(?<!\\)#.*/ to respect escaped hashes
  • Added test case testExtendedSyntaxEscapedHash in tests/PHPStan/Analyser/nsrt/preg_match_shapes.php

Note

This fix does not address # inside character classes like [#.] (which is also not a comment in PCRE). However, users can use the workaround [\#.] which now works correctly with this fix.

Copilot AI review requested due to automatic review settings January 24, 2026 07:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes handling of escaped hash (\#) when stripping comments in PCRE extended (x) mode so that \# is treated as a literal # rather than the start of a comment.

Changes:

  • Update the freespacing comment-stripping logic in RegexGroupParser to account for escaped hashes.
  • Add an nsrt test case covering escaped hash behavior in extended mode.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/Type/Regex/RegexGroupParser.php Adjusts the regex used to strip #... comments in /x mode.
tests/PHPStan/Analyser/nsrt/preg_match_shapes.php Adds a regression test to validate escaped-hash handling in extended mode parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// in freespacing mode the # character starts a comment and runs until the end of the line
$regex = preg_replace('/(?<!\?)#.*/', '', $regex) ?? '';
// but \# is an escaped literal hash, not a comment
$regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? '';
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new freespacing comment-stripping pattern will also treat the # in PCRE inline comment groups (?# ... ) as a line comment (because it’s not preceded by \). This is a regression from the previous (?<!\?) guard and can break valid patterns (e.g. tests/PHPStan/Analyser/nsrt/bug-11311.php contains (?# ...) inside an /x regex). Consider updating the stripping logic to exclude # that is part of (?#...) (and ideally avoid using a single lookbehind heuristic altogether).

Suggested change
$regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? '';
// also, (?# ... ) is an inline comment group and its # must not be treated as a line comment
$regex = preg_replace('/(?<!\\\)(?<!\(\?)#.*/', '', $regex) ?? '';

Copilot uses AI. Check for mistakes.
// in freespacing mode the # character starts a comment and runs until the end of the line
$regex = preg_replace('/(?<!\?)#.*/', '', $regex) ?? '';
// but \# is an escaped literal hash, not a comment
$regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? '';
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/(?<!\\\)#.*/ only checks the single character immediately before #. In extended mode, \\# (two backslashes before #) should still start a comment because the last backslash is escaped, but this regex will incorrectly preserve it. To match PCRE semantics you need to treat # as escaped only when it’s preceded by an odd-length run of backslashes (and also ensure you’re not inside a character class).

Suggested change
$regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? '';
// and # inside a character class does not start a comment
$length = strlen($regex);
$processedRegex = '';
$inCharClass = false;
$backslashRun = 0;
for ($i = 0; $i < $length; $i++) {
$char = $regex[$i];
$escaped = ($backslashRun % 2) === 1;
// update character class state (unescaped [ and ])
if ($char === '[' && !$escaped) {
$inCharClass = true;
} elseif ($char === ']' && !$escaped) {
$inCharClass = false;
}
// start of a comment: unescaped # outside of character class
if ($char === '#' && !$escaped && !$inCharClass) {
// skip until end of line or end of pattern
while ($i + 1 < $length) {
$nextChar = $regex[$i + 1];
if ($nextChar === "\n" || $nextChar === "\r") {
break;
}
$i++;
}
$backslashRun = 0;
continue;
}
$processedRegex .= $char;
if ($char === '\\') {
$backslashRun++;
} else {
$backslashRun = 0;
}
}
$regex = $processedRegex;

Copilot uses AI. Check for mistakes.
// in freespacing mode the # character starts a comment and runs until the end of the line
$regex = preg_replace('/(?<!\?)#.*/', '', $regex) ?? '';
// but \# is an escaped literal hash, not a comment
$regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? '';
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The literal /(?<!\\\)#.*/ relies on PHP single-quote escaping rules (the trailing \ is not an escape sequence), which is hard to read/maintain. Consider rewriting it in a clearer form (e.g. using \\\\ in the PHP string for a \\ regex escape) or adding a short comment explaining the escaping, to reduce the chance of accidental changes breaking the pattern.

Suggested change
$regex = preg_replace('/(?<!\\\)#.*/', '', $regex) ?? '';
// (?<!\\\\) is a negative lookbehind for a backslash; '\\\\' in PHP becomes '\\' in the regex
$regex = preg_replace('/(?<!\\\\)#.*/', '', $regex) ?? '';

Copilot uses AI. Check for mistakes.
\# was incorrectly treated as comment start in /x mode
@ondrejmirtes
Copy link
Member

/cc @staabm Please review 😊

Copy link
Contributor

@staabm staabm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of such a regex feature, but implementation wise it looks good to me.

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants