Regular Expression Conception

A regular expression (regex) is a sequence of characters that defines a search pattern. It is commonly used for string matching and manipulation in various programming languages and tools. Regular expressions allow you to search for specific patterns in text, such as email addresses, phone numbers, or any other string format.
Example:

1
2
# A regex pattern to match email addresses
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Regular Expression Syntax

Basic Symbols

  • \ : Escape character
    Example: \n (newline), \\ (matches \), \( (matches parentheses)
  • ^ : Matches the beginning of a string (or line in multiline mode)
  • $ : Matches the end of a string (or line in multiline mode)
  • . : Matches any single character except newline

    Quantifiers (Repetition)

  • * : Matches 0 or more times (e.g., zo* → z, zo, zoo)
  • + : Matches 1 or more times (e.g., zo+ → zo, zoo)
  • ? : Matches 0 or 1 time (e.g., do(es)?)
  • {n} : Matches exactly n times
  • {n,} : Matches at least n times
  • {n,m} : Matches between n and m times

Note:

  • Default is greedy matching (matches as much as possible)
  • Add ? to make it non-greedy (match as little as possible)
    Example: o+ vs o+?

Grouping and Logic

  • (pattern) : Group and capture
  • (?:pattern) : Non-capturing group
  • x|y : Match x or y
  • (?=pattern) : Positive lookahead (match position followed by pattern)
  • (?!pattern) : Negative lookahead (match position NOT followed by pattern)

Character Sets

  • [abc] : Match a, b, or c
  • [^abc] : Match any character except a, b, c
  • [a-z] : Match lowercase letters
  • [^a-z] : Match any character not in a–z

Boundaries

  • \b : Word boundary
  • \B : Non-word boundary

Common Escape Sequences

  • \d : Digit (same as [0-9])
  • \D : Non-digit
  • \w : Word character (A–Z, a–z, 0–9, _)
  • \W : Non-word character
  • \s : Whitespace (space, tab, newline, etc.)
  • \S : Non-whitespace
  • \n : Newline
  • \r : Carriage return
  • \t : Tab

Advanced

  • \xNN : Hex character (e.g., \x41 = A)
  • \uNNNN : Unicode character (e.g., \u00A9 = ©)
  • \num : Backreference (e.g., (.)\1 matches repeated characters)

Examples

1
2
3
(.)\1 # Matches repeated characters (e.g., aa, 11)
\d{3}-\d{4} # Matches a phone number format (e.g., 123-4567)
^Hello # Matches strings that start with Hello