Regular Expression Conception
Regular Expression Conception
A regular expression (regex) is a sequence of characters that defines a search pattern. It is commonly used for string matching and manipulation in various programming languages and tools. Regular expressions allow you to search for specific patterns in text, such as email addresses, phone numbers, or any other string format.
Example:1
2# A regex pattern to match email addresses
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Regular Expression Syntax
Basic Symbols
\: Escape character
Example:\n(newline),\\(matches\),\((matches parentheses)^: Matches the beginning of a string (or line in multiline mode)$: Matches the end of a string (or line in multiline mode).: Matches any single character except newlineQuantifiers (Repetition)
*: Matches 0 or more times (e.g.,zo*→ z, zo, zoo)+: Matches 1 or more times (e.g.,zo+→ zo, zoo)?: Matches 0 or 1 time (e.g.,do(es)?){n}: Matches exactly n times{n,}: Matches at least n times{n,m}: Matches between n and m times
Note:
- Default is greedy matching (matches as much as possible)
- Add
?to make it non-greedy (match as little as possible)
Example:o+vso+?
Grouping and Logic
(pattern): Group and capture(?:pattern): Non-capturing groupx|y: Match x or y(?=pattern): Positive lookahead (match position followed by pattern)(?!pattern): Negative lookahead (match position NOT followed by pattern)
Character Sets
[abc]: Matcha,b, orc[^abc]: Match any character excepta,b,c[a-z]: Match lowercase letters[^a-z]: Match any character not ina–z
Boundaries
\b: Word boundary\B: Non-word boundary
Common Escape Sequences
\d: Digit (same as[0-9])\D: Non-digit\w: Word character (A–Z,a–z,0–9,_)\W: Non-word character\s: Whitespace (space, tab, newline, etc.)\S: Non-whitespace\n: Newline\r: Carriage return\t: Tab
Advanced
\xNN: Hex character (e.g.,\x41=A)\uNNNN: Unicode character (e.g.,\u00A9= ©)\num: Backreference (e.g.,(.)\1matches repeated characters)
Examples
1 | (.)\1 # Matches repeated characters (e.g., aa, 11) |
All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.
