Add a built-in RE2 implementation based on re2js#290
Add a built-in RE2 implementation based on re2js#290jonbodner-buf wants to merge 16 commits intomainfrom
Conversation
…h broken imports.
srikrsna
left a comment
There was a problem hiding this comment.
Left some comments regarding the integration. I'll review the RE2 implementation in a second pass.
| /\\c[A-Z]/, // control character eg: /\cM\cJ/ | ||
| /\\u[0-9a-fA-F]{4}/, // UTF-16 code-unit | ||
| /\\0(?!\d)/, // NUL | ||
| /\[\\b.*\]/, // Backspace eg: [\b] |
| // can probably delete this since the RE2 engine will already reject them, but keep for now | ||
| for (const invalidPattern of invalidPatterns) { | ||
| if (invalidPattern.test(pattern)) { | ||
| throw new Error( | ||
| `Error evaluating pattern ${pattern}, invalid RE2 syntax`, | ||
| ); | ||
| } |
| } | ||
| const re = new RegExp(pattern, flags); | ||
| return re.test(this); | ||
| const re: RE2JS = RE2JS.compile(pattern, flagVal); |
There was a problem hiding this comment.
My understanding is that flags are part of the syntax in RE2? Can we add support for them in the library instead of trying to identify them here? Or am I missing something?
cel-go also passes them directly to regex engine without any preprocessing: https://github.com/google/cel-go/blob/646cdc1728643aec9499e3a00236ef1007a5d3fa/common/types/string.go#L156
There was a problem hiding this comment.
yeah, not needed. Removing the code.
| "@bufbuild/re2": "0.4.0" | ||
| }, | ||
| "devDependencies": { | ||
| "@unicode/unicode-16.0.0": "^1.6.16", |
There was a problem hiding this comment.
Is this a peer dependency that is required?
| "peggy": "^5.0.6", | ||
| "peggy-ts": "github:hudlow/peggy-ts#v0.0.9", | ||
| "expect-type": "^1.3.0" | ||
| "unicode-property-value-aliases": "^3.9.0" |
There was a problem hiding this comment.
Is this a peer dependency that is required?
| "@unicode/unicode-16.0.0": "^1.6.16", | ||
| "unicode-property-value-aliases": "^3.9.0" |
There was a problem hiding this comment.
I see that these are added as dev dependencies in the cel package as well, will users end up needing them?
There was a problem hiding this comment.
no, they are used for tests and to build a unicode lookup table.
| }, | ||
| "dependencies": { | ||
| "@bufbuild/re2": "^0.1.0" |
There was a problem hiding this comment.
I think it's needed? The cel package now depends on the re2 package to run.
There was a problem hiding this comment.
But this is the root workspace package.json not for the CEL package.
|
|
||
| // biome-ignore format: table | ||
| export default [ | ||
| // ! |
There was a problem hiding this comment.
We should revert the formatting changes here. It was deliberately formatted like this for readability
There was a problem hiding this comment.
yeah, this wasn't intentional. I'll revert it.
| SemanticAdorner, | ||
| toDebugString, | ||
| } from "@bufbuild/cel-spec/testdata/to-debug-string.js"; | ||
| } from "../../cel-spec/dist/cjs/testdata/to-debug-string.js"; |
There was a problem hiding this comment.
| } from "../../cel-spec/dist/cjs/testdata/to-debug-string.js"; | |
| } from "@bufbuild/cel-spec/testdata/to-debug-string.js"; |
|
|
||
| ## Credits | ||
| This code is a fork of the [RE2JS](https://re2js.leopard.in.ua) project. It has been converted to TypeScript and has a feature set tailored for | ||
| CEL and Protovalidate-es. No newline at end of file |
There was a problem hiding this comment.
I don't think it does anything special for protovalidate, does it?
| CEL and Protovalidate-es. | |
| CEL. |
There was a problem hiding this comment.
Not yet, but when native rules are added to protovalidate-es, we will want to use the re2 engine for string pattern rules. Agree that we should remove that for now.
| pattern = pattern.substring(flagMatches[0].length); | ||
| } | ||
| const re = new RegExp(pattern, flags); | ||
| const re: RE2JS = RE2JS.compile(pattern); |
There was a problem hiding this comment.
Seconding Krishna's comment - return RE2JS.compile(pattern).test(this); is idiomatic style.
| celMethod(olc.ENDS_WITH, STRING, [STRING], BOOL, String.prototype.endsWith), | ||
| celMethod(olc.STARTS_WITH, STRING, [STRING], BOOL, String.prototype.startsWith), | ||
| celMethod(olc.MATCHES, STRING, [STRING], BOOL, matches), | ||
| // ! |
There was a problem hiding this comment.
The previous formatting looks more readable to me, and there's a formatter ignore comment on line 86 to keep it. Any reason to change the formatting? If yes, let's drop the formatter ignore.
| SemanticAdorner, | ||
| toDebugString, | ||
| } from "@bufbuild/cel-spec/testdata/to-debug-string.js"; | ||
| } from "../../cel-spec/dist/cjs/testdata/to-debug-string.js"; |
There was a problem hiding this comment.
This can break in subtle and mysterious ways. Need to keep the previous import.
| "include": ["src/**/*.test.ts"], | ||
| "exclude": ["./src/__tests__", "./src/__fixtures__", "./src/__utils__"] |
There was a problem hiding this comment.
This is broken - the exclude means we aren't catching any compiler errors in tests, fixtures, or utils.
| dist/*/testing.js | ||
| dist/*/testing.d.ts |
There was a problem hiding this comment.
Those two lines aren't needed here. The picture will change with a fixed tsconfig.json that includes tests and fixtures.
c008b4c to
5530de2
Compare
The CEL spec says that its regular expressions meet the RE2 spec, but the existing ES implementation defaults to the regex implementation that's built into ES runtimes, which uses a backtracking implementation that has pathological cases susceptible to ReDOS attacks.
This PR adds a stripped-down version of RE2JS (https://github.com/le0pard/re2js) as a package, and integrates it into CEL as the default regex engine.