The Type-Safe Regex Builder for Java. If it compiles, it works.
You've seen this before. Someone writes a regex, it works, and six months later nobody — including the author — can read it:
// What does this even do?
Pattern p = Pattern.compile("^(?=[\\p{Lu}])[\\p{L}\\p{Nd}_]{3,15}+[0-9]?$");You add a character class, break the balance of brackets, and find out at runtime. You copy a regex from Stack Overflow, miss an escape, and watch it fail silently in production. You duplicate the same validation pattern across DTOs and forget to update one of them.
There is a better way.
Sift is a fluent DSL that turns regex construction into readable, self-documenting Java code. Its state machine enforces grammatical correctness at compile time — if your pattern compiles, it is structurally valid.
// The same pattern, written with Sift:
String regex = Sift.fromStart()
.exactly(1).upperCaseLettersUnicode() // Must start with an uppercase letter
.then()
.between(3, 15).wordCharactersUnicode().withoutBacktracking() // ReDoS-safe
.then()
.optional().digits() // May end with a digit
.andNothingElse()
.shake();
// Result: ^[\p{Lu}][\p{L}\p{Nd}_]{3,15}+[0-9]?$Your IDE guides every step. Wrong transitions simply do not exist as methods.
Gradle:
// Core engine — zero external dependencies
implementation 'com.mirkoddd:sift-core:<latest>'
// Optional: Jakarta Validation / Hibernate Validator integration
implementation 'com.mirkoddd:sift-annotations:<latest>'
// Optional: Engine RE2J
implementation 'com.mirkoddd:sift-engine-re2j:<latest>'
// Optional: Engine GraalVM
implementation 'com.mirkoddd:sift-engine-graalvm:<latest>'Maven:
<dependency>
<groupId>com.mirkoddd</groupId>
<artifactId>sift-core</artifactId>
<version>latest</version>
</dependency>
<!-- Optional: Jakarta Validation / Hibernate Validator integration -->
<dependency>
<groupId>com.mirkoddd</groupId>
<artifactId>sift-annotations</artifactId>
<version>latest</version>
</dependency>
<!-- Optional: Engine GraalVM -->
<dependency>
<groupId>com.mirkoddd</groupId>
<artifactId>sift-engine-graalvm</artifactId>
<version>latest</version>
</dependency>
<!-- Optional: Engine RE2J -->
<dependency>
<groupId>com.mirkoddd</groupId>
<artifactId>sift-engine-re2j</artifactId>
<version>latest</version>
</dependency>Sift targets Java 8 bytecode for maximum compatibility — including legacy Spring Boot 2.x and Android.
| Method | Generates | Use when |
|---|---|---|
Sift.fromStart() |
^... |
Validating an entire string |
Sift.fromAnywhere() |
... |
Building reusable fragments or searching within text |
Sift.fromWordBoundary() |
\b... |
Matching whole words |
Sift.fromPreviousMatchEnd() |
\G... |
Iterative parsing |
Sift.filteringWith(flag) |
(?i)... |
Global flags (case-insensitive, multiline, dotall) |
| Method | Effect |
|---|---|
.shake() |
Returns the raw regex String |
.sieve() |
Compiles with the default JDK engine → SiftCompiledPattern |
.sieveWith(engine) |
Compiles with a custom engine → SiftCompiledPattern |
.andNothingElse() |
Appends $ and seals the pattern |
The real power of Sift is the ability to name your building blocks and compose them. Every Sift.fromAnywhere() call returns a reusable SiftPattern<Fragment> that can be embedded anywhere without carrying unwanted anchors.
// Define named building blocks
SiftPattern<Fragment> year = Sift.fromAnywhere().exactly(4).digits();
SiftPattern<Fragment> month = Sift.fromAnywhere().exactly(2).digits();
SiftPattern<Fragment> day = Sift.fromAnywhere().exactly(2).digits();
SiftPattern<Fragment> dash = Sift.fromAnywhere().character('-');
// Compose them into a date block
SiftPattern<Fragment> dateBlock = year.followedBy(dash, month, dash, day);
// Embed inside a larger pattern
String logRegex = Sift.fromStart()
.of(dateBlock)
.followedBy(' ')
.then().oneOrMore().anyCharacter()
.andNothingElse()
.shake();
// Result: ^[0-9]{4}-[0-9]{2}-[0-9]{2} .+$Root vs Fragment: Patterns built with
fromStart()or closed withandNothingElse()becomeSiftPattern<Root>— they are sealed and cannot be embedded. Attempting to embed aRootpattern causes a compile-time error. Sift uses the Java type system itself as a safety net.
Sift patterns are not just validators. They are fully equipped extraction tools.
// Define a structured pattern with named groups
NamedCapture yearGroup = SiftPatterns.capture("year", Sift.exactly(4).digits());
NamedCapture monthGroup = SiftPatterns.capture("month", Sift.exactly(2).digits());
NamedCapture dayGroup = SiftPatterns.capture("day", Sift.exactly(2).digits());
SiftPattern<?> datePattern = Sift.fromStart()
.namedCapture(yearGroup)
.followedBy('-')
.then().namedCapture(monthGroup)
.followedBy('-')
.then().namedCapture(dayGroup)
.andNothingElse();
// Extract structured data directly — no Matcher boilerplate
Map<String, String> fields = datePattern.extractGroups("2026-03-13");
// → { "year": "2026", "month": "03", "day": "13" }
// Extract all matches from a larger text
List<String> prices = Sift.fromAnywhere()
.oneOrMore().digits()
.sieve()
.extractAll("Order: 3 items at 25 and 40 euros");
// → ["3", "25", "40"]
// Stream results lazily for large inputs
Sift.fromAnywhere().oneOrMore().lettersUnicode()
.streamMatches(largeText)
.filter(word -> word.length() > 5)
.forEach(System.out::println);Full extraction API:
| Method | Returns | Description |
|---|---|---|
containsMatchIn(input) |
boolean |
Is there at least one match? |
matchesEntire(input) |
boolean |
Does the entire string match? |
extractFirst(input) |
Optional<String> |
First match, or empty |
extractAll(input) |
List<String> |
All matches |
extractGroups(input) |
Map<String, String> |
Named groups from first match |
extractAllGroups(input) |
List<Map<String, String>> |
Named groups from all matches |
replaceFirst(input, replacement) |
String |
Replace first match |
replaceAll(input, replacement) |
String |
Replace all matches |
splitBy(input) |
List<String> |
Split around matches |
streamMatches(input) |
Stream<String> |
Lazy stream of all matches |
Stop duplicating regex logic across your DTOs. Define a rule once, reuse it everywhere with @SiftMatch.
// 1. Define a reusable rule
public class PromoCodeRule implements SiftRegexProvider {
@Override
public String getRegex() {
return Sift.fromStart()
.atLeast(4).letters()
.then()
.exactly(3).digits()
.andNothingElse()
.shake();
}
}
// 2. Apply it declaratively — pattern is compiled once at bootstrap
public record ApplyPromoRequest(
@SiftMatch(
value = PromoCodeRule.class,
flags = { SiftMatchFlag.CASE_INSENSITIVE },
message = "Invalid promo code format"
)
String promoCode
) {}Sift makes performance-safe patterns easy to express without memorizing obscure syntax.
// Possessive quantifier — prevents catastrophic backtracking
Sift.fromAnywhere()
.oneOrMore().wordCharacters().withoutBacktracking(); // generates \w++
// Atomic group — locks a sub-pattern once matched
SiftPattern<Fragment> safe = Sift.fromAnywhere()
.oneOrMore().digits()
.preventBacktracking(); // wraps in (?>...)
// Lazy quantifier — matches as few characters as possible
Sift.fromAnywhere()
.oneOrMore().anyCharacter().asFewAsPossible(); // generates .+?By default, Sift compiles patterns using the standard java.util.regex engine via .sieve().
For use cases where the JDK engine is not suitable — such as environments requiring
linear-time guarantees or GraalVM native images — Sift exposes a SiftEngine SPI that
accepts any compatible backend.
// Default — uses java.util.regex internally
SiftCompiledPattern pattern = Sift.fromAnywhere()
.oneOrMore().digits()
.sieve();
// Custom engine — e.g. RE2J for linear-time, ReDoS-immune matching
SiftCompiledPattern pattern = Sift.fromAnywhere()
.oneOrMore().digits()
.sieveWith(Re2jEngine.INSTANCE); // sift-engine-re2j moduleSift tracks the advanced features used during pattern construction as a Set<RegexFeature>
and passes it to the engine at compile time. If an engine doesn't support a requested
feature — for example, RE2J does not support lookarounds or backreferences — it throws
UnsupportedOperationException immediately, before any input is processed.
Available engine modules:
sift-core— includesJdkEngine(default, zero dependencies)sift-engine-re2j— RE2J backend (coming soon)sift-engine-graalvm— GraalVM Regex backend (coming soon)
Lookarounds let you match based on what surrounds a position without consuming those characters. Sift exposes them as readable methods directly on Connector, so they flow naturally in the chain.
// Positive lookahead — match "file" only if followed by ".pdf"
SiftPattern<Fragment> pdfFile = Sift.fromAnywhere()
.oneOrMore().wordCharacters()
.mustBeFollowedBy(SiftPatterns.literal(".pdf"));
// Negative lookahead — match a number NOT followed by "%"
SiftPattern<Fragment> absoluteValue = Sift.fromAnywhere()
.oneOrMore().digits()
.notFollowedBy(SiftPatterns.literal("%"));
// Positive lookbehind — match digits only if preceded by "$"
SiftPattern<Fragment> dollarAmount = Sift.fromAnywhere()
.oneOrMore().digits()
.mustBePrecededBy(SiftPatterns.literal("$"));
// Negative lookbehind — match "port" NOT preceded by "pass"
SiftPattern<Fragment> networkPort = Sift.fromAnywhere()
.of(SiftPatterns.literal("port"))
.notPrecededBy(SiftPatterns.literal("pass"));All four lookaround types are available both on Connector — for inline chaining — and as standalone factories in SiftPatterns for use in composition with followedByAssertion() and precededByAssertion().
SiftCatalog provides a curated set of production-ready, ReDoS-safe patterns for common formats. All patterns are Fragment-typed — they compose cleanly with your own Sift chains.
// Use standalone
boolean valid = SiftCatalog.email().matchesEntire("user@example.com");
// Or embed inside a larger pattern
String regex = Sift.fromStart()
.of(SiftCatalog.uuid())
.followedBy('/')
.then().of(SiftCatalog.isoDate())
.andNothingElse()
.shake();Available patterns: uuid(), ipv4(), macAddress(), email(), webUrl(), isoDate().
Parse arbitrarily deep balanced structures with SiftPatterns.nesting().
// Match nested parentheses: ((a)(b))
SiftPattern<Fragment> nested = SiftPatterns.nesting(5)
.using(Delimiter.PARENTHESES)
.containing(Sift.fromAnywhere().oneOrMore().lettersUnicode());
nested.containsMatchIn("((hello)(world))"); // trueSift exposes regex conditional logic — if/then/else branches — as a fully type-safe fluent API via SiftPatterns.
// Match a price: if preceded by "USD" consume digits only,
// otherwise consume digits followed by a currency symbol
SiftPattern<Fragment> price = SiftPatterns
.ifPrecededBy(SiftPatterns.literal("USD"))
.thenUse(Sift.fromAnywhere().oneOrMore().digits())
.otherwiseUse(
Sift.fromAnywhere().oneOrMore().digits()
.followedBy(Sift.fromAnywhere().character('€'))
);
// Else-if chaining is also supported
SiftPattern<Fragment> format = SiftPatterns
.ifFollowedBy(SiftPatterns.literal("px"))
.thenUse(Sift.fromAnywhere().oneOrMore().digits())
.otherwiseIfFollowedBy(SiftPatterns.literal("%"))
.thenUse(Sift.fromAnywhere().between(1, 3).digits())
.otherwiseNothing(); // No else branch — engine moves forward silentlyThe state machine enforces the correct declaration order at compile time: ifXxx → thenUse → otherwiseUse / otherwiseIfFollowedBy / otherwiseNothing. An incomplete conditional is not expressible.
| Raw Java Regex | Sift | |
|---|---|---|
| Syntax errors | Discovered at runtime | Impossible to express |
| Readability | Cryptic symbols | Self-documenting method names |
| Reusability | Copy-paste | Named SiftPattern fragments |
| Thread safety | Manual | Guaranteed, all patterns are immutable |
| ReDoS protection | Requires expert knowledge | Built-in API |
| Jakarta Validation | Manual @Pattern duplication |
@SiftMatch + SiftRegexProvider |
| Regex engine | JDK only | Pluggable (JDK, RE2J, GraalVM, etc...) |
| Dependencies | — | Zero (sift-core) |
- Sift Cookbook — Advanced recipes: TSV log parsing, UUID validation, lookarounds, data extraction with named captures, conditional patterns, and more.
- Javadoc — sift-core
- Javadoc — sift-annotations
- Changelog
- Contributing
