Regex (Regular Expression) is an API for defining String patterns that can be used for searching, manipulating, and editing text. It is a language of its own, integrated into Java to provide industrial-strength text processing capabilities.
Working with Regex in Java revolves around two main classes and one exception class. Unlike simple String.contains(), the Regex API follows a "Compile once, Match many" philosophy.
A compiled representation of a regular expression. You don't "new" a Pattern; you use Pattern.compile(regex).
The engine that performs match operations on a string by interpreting the Pattern. It holds the state of the search.
An unchecked exception that indicates a syntax error in a regular expression pattern.
To build patterns, you use special characters that represent types of data or quantities. In Java strings, remember to use double backslashes (\\) to escape these characters.
| Symbol | Description | Example |
|---|---|---|
. | Any single character | a.b matches "acb", "a1b" |
\\d | Any digit [0-9] | \\d\\d matches "10", "99" |
\\w | Word character [a-zA-Z_0-9] | \\w+ matches "Java_123" |
\\s | Whitespace character | \\d\\s\\d matches "1 2" |
^ / $ | Start / End of a line | ^Hello matches "Hello world" |
[abc] | Any of a, b, or c | [Hh]ello matches "Hello" or "hello" |
Quantifiers specify how many times a character or group should occur. This is where Regex becomes truly flexible.
* : 0 or more times.+ : 1 or more times (Mandatory presence).? : 0 or 1 time (Optional).{n} : Exactly n times.{n, m} : Between n and m times.? (e.g., .+?) makes them "Lazy," matching as little as possible.
The standard way to use Regex in Java involves three steps: Compile, Match, and Iterate.
Parentheses () are used to create "groups." This allows you to treat a part of the pattern as a single unit or extract just that specific part later.
Example: (\\d{3})-(\\d{3}-\\d{4}).
Group 1: Area Code (first 3 digits).
Group 2: The rest of the phone number.
For simple tasks, you don't always need to use the Pattern class. The java.lang.String class has built-in support for regex:
str.matches(regex): Returns true if the entire string matches the pattern.str.split(regex): Breaks the string into an array based on the pattern (e.g., splitting by commas or spaces).str.replaceAll(regex, replacement): Replaces every match with a new string.Compiling a pattern is an expensive operation. If you are using the same Regex inside a loop or a high-traffic method, never use String.matches() or Pattern.compile() inside the loop. Instead, compile the Pattern once as a static final constant.
private static final Pattern EMAIL_PATTERN = Pattern.compile("^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,6}$", Pattern.CASE_INSENSITIVE);
Q: What is the difference between Matcher.find() and Matcher.matches()?
A: matches() attempts to match the entire input string against the pattern. find() scans the input string looking for the next subsequence that matches the pattern (it works like a "find next" button).
Q: Why do we use double backslashes in Java regex?
A: In Java, the backslash is an escape character for strings (like \n). To pass a literal backslash to the Regex engine (like \d), we have to escape the backslash itself: \\d.
Q: What is Backtracking?
A: Backtracking occurs when the engine tries one path, fails, and goes back to try another. Poorly written regex with nested quantifiers (like (a+)+b) can lead to Catastrophic Backtracking, causing the CPU to spike to 100%.
Java Regex is a high-utility skill that separates average developers from experts. It enables you to write concise, powerful code for complex text validation and parsing. While the syntax can look like "gibberish" at first, mastering the basic metacharacters and the Pattern/Matcher workflow will allow you to solve text-processing problems in seconds that would otherwise take hundreds of lines of code.
Next: Java I/O - Reading and Writing Data →