Regular Expressions (Regex)
Regular expressions (regex or regexp) are powerful patterns used for searching, matching, and manipulating text.
Basic Characters
- .(dot): Matches any character except a newline.
- \d: Matches any digit (0-9).
- \D: Matches any character that is not a digit.
- \w: Matches any word character (alphanumeric and underscore).
- \W: Matches any non-word character.
- \s: Matches any whitespace character (space, tab, newline).
- \S: Matches any non-whitespace character.
Quantifiers
- *: Matches 0 or more occurrences of the preceding element.
- +: Matches 1 or more occurrences of the preceding element.
- ?: Matches 0 or 1 occurrence of the preceding element.
- {n}: Matches exactly n occurrences of the preceding element.
- {n,}: Matches n or more occurrences of the preceding element.
- {n,m}: Matches between n and m occurrences of the preceding element.
Anchors
- ^: Matches the start of a line.
- $: Matches the end of a line.
Character Classes
- [abc]: Matches any single character 'a', 'b', or 'c'.
- [a-z]: Matches any lowercase letter from 'a' to 'z'.
- [A-Z]: Matches any uppercase letter from 'A' to 'Z'.
- [0-9]: Matches any digit from 0 to 9.
Predefined Character Classes
- \dis equivalent to- [0-9].
- \wis equivalent to- [a-zA-Z0-9_].
- \sincludes various whitespace characters (space, tab, newline, etc.).
Negation
- [^abc]: Matches any character that is not 'a', 'b', or 'c'.
Grouping and Alternation
- (abc): Groups characters together.
- a|b: Matches 'a' or 'b'.
Escaping
- Some characters have special meanings in regex (e.g., - .). To match them literally, escape with a backslash (e.g.,- \.).
Modifiers
- i: Case-insensitive matching.
- m: Multi-line matching.
- s(dotall): Allows- .to match newlines.
Quantifier Greediness
- By default, quantifiers are greedy (match as much as possible). Use - *?,- +?, and- ??for non-greedy matching.
Greedy will consume as much as possible. Suppose you have the following:
<em>Hello World</em>
Use <.+> will get <em>Hello World</em>
But if you only want <em> or </em> , use <.+?> , the regex will match
non-greedily.Anchors
- ^matches the start of a line.
- $matches the end of a line.
- \bmatches a word boundary.
\bword\b  # Matches "word" as a whole wordLookahead and Lookbehind
- (?=...): Positive lookahead assertion- Only find strings that have - ...behind , eg:- regex = apple(?=pie) - applepie ✅- apple ❌ 
 
- (?!...): Negative lookahead assertion- Only find strings that don't have - ...behind
 
- (?<=...): Positive lookbehind assertion- Only find strings that have - ...in front
 
- (?<!...): Negative lookbehind assertion- Only find strings that don't have - ...in front
 
Comments
- (?#...): Add comments within your regular expression.
Last updated