Regular expressions look like someone sat on a keyboard. A string like ^[\w.-]+@[\w-]+\.[\w.]+$ isn't exactly welcoming — but it's one of the most useful skills you can pick up as a developer, data analyst, or anyone who works with text. Once the syntax clicks, you'll wonder how you ever got by without it.
The trick? Don't try to learn regex from a reference table. Learn it by doing. Open a regex tester, paste in some sample text, and start building patterns one piece at a time. That's exactly what we'll do here.
What is a regular expression?
A regular expression (regex) is a pattern that describes a set of strings. You write a pattern, and the regex engine finds every match in your text. Think of it as a super-powered find-and-replace.
Regex is supported in virtually every programming language — JavaScript, Python, Java, Go, PHP, Ruby — plus text editors like VS Code and Sublime Text. The core syntax is the same everywhere, with minor flavor differences.
The building blocks
Before jumping to full patterns, you need about ten concepts. That's it. Everything else is a combination of these.
Literal characters
The letter a matches the letter "a." The string cat matches "cat." Nothing fancy here.
Character classes
Square brackets define a set of characters to match:
[aeiou]— any vowel[0-9]— any digit[a-zA-Z]— any letter, upper or lowercase[^0-9]— anything that is NOT a digit (the caret negates inside brackets)
Shorthand classes
These save you from writing out common character classes:
\d— any digit (same as[0-9])\w— any "word character" (letters, digits, underscore)\s— any whitespace (space, tab, newline).— any character except newline
Capital versions (\D, \W, \S) match the opposite.
Quantifiers
Quantifiers say "how many" of the preceding element to match:
*— zero or more+— one or more?— zero or one (makes something optional){3}— exactly three{2,5}— between two and five
Anchors
^— start of string (or start of line with the multiline flag)$— end of string (or end of line)\b— word boundary
That's the core. Seriously. Let's put it to work.
Example 1: Matching an email address
Here's a practical regex for validating email addresses:
^[\w.-]+@[\w-]+\.[\w.]+$
Breaking it down:
^— start of string[\w.-]+— one or more word characters, dots, or hyphens (the local part)@— the literal @ symbol[\w-]+— one or more word characters or hyphens (the domain)\.— a literal dot (escaped because.is a wildcard otherwise)[\w.]+— one or more word characters or dots (the TLD, likecomorco.uk)$— end of string
Paste this into a regex tester along with a few email addresses. Try valid ones, invalid ones, edge cases. Watch what matches and what doesn't. That immediate feedback is worth more than reading ten pages of theory.
Is this regex perfect for all valid email addresses per RFC 5322? No. But it handles 99% of real-world cases, and it's readable.
Example 2: Matching a URL
https?:\/\/[\w.-]+(?:\/[\w./?%&=-]*)?
Here's what's happening:
https?— "http" with an optional "s":\/\/— the literal://(slashes escaped)[\w.-]+— the domain name(?:\/[\w./?%&=-]*)?— an optional path. The(?:...)is a non-capturing group — it groups without creating a capture
This matches URLs like https://example.com, http://example.com/page?q=hello, and https://sub.domain.co.uk/path. Try it with your own sample URLs and see where it breaks.
Example 3: Matching a phone number
Phone numbers are messy because formatting varies wildly. Here's a pattern for US numbers:
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
This matches:
(555) 123-4567555-123-4567555.123.45675551234567
The \(? and \)? handle optional parentheses. The [-.\s]? handles optional separators (dash, dot, or space). The \d{3} and \d{4} match the digit groups.
Example 4: Extracting dates
Want to find dates in MM/DD/YYYY format?
\b(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}\b
This is more specific than a naive \d{2}\/\d{2}\/\d{4} because it validates month range (01-12) and day range (01-31). Word boundaries (\b) prevent partial matches inside longer numbers.
Flags you should know
Most regex engines support flags that change how the pattern behaves:
g(global) — find all matches, not just the first onei(case-insensitive) —hellomatches "Hello", "HELLO", etc.m(multiline) —^and$match start/end of each line, not just the whole string
In a regex tester, you'll usually see toggle buttons for these. Turn on g and i for most of your experimenting.
Common regex mistakes
Forgetting to escape special characters. The dot (.) matches anything. If you want a literal dot, you need \. — this trips up almost everyone at first.
Being too greedy. By default, .* matches as much as possible. If you're parsing HTML and write <.*>, it'll match from the first < to the last > in the entire string — not individual tags. Use .*? for lazy (non-greedy) matching.
Overcomplicating the pattern. If your regex is 200 characters long, step back. Can you break the problem into smaller pieces? Can you do some preprocessing first? Sometimes two simple patterns beat one monster pattern.
Not testing edge cases. Your pattern works on "normal" input. But what about empty strings, strings with only whitespace, Unicode characters, or extremely long input? A regex tester lets you throw all of these at your pattern in seconds.
A quick regex cheat sheet
| Pattern | Matches |
|-----------|--------------------------------|
| . | Any character (except newline) |
| \d | Digit |
| \w | Word character |
| \s | Whitespace |
| [abc] | a, b, or c |
| [^abc] | Not a, b, or c |
| a* | Zero or more a's |
| a+ | One or more a's |
| a? | Zero or one a |
| a{3} | Exactly three a's |
| (abc) | Capture group |
| (?:abc) | Non-capturing group |
| a\|b | a or b |
| ^ | Start of string/line |
| $ | End of string/line |
| \b | Word boundary |
How to actually get better at regex
Reading about regex only gets you so far. The real learning happens when you type a pattern, see it fail, tweak it, and watch it match. That feedback loop is everything.
Start with a real problem you have — maybe you need to extract all email addresses from a document, or validate phone numbers in a form, or find every TODO comment in a codebase. Open the regex tester, paste your real data, and iterate.
Keep a personal collection of patterns that work for you. After a few weeks, you'll have a handful of tested patterns you reach for automatically, and the syntax won't look like keyboard gibberish anymore.
Regex isn't something you master in a day. But you can get genuinely productive with it in an afternoon — especially when you've got a live tester that shows you exactly what your pattern does as you type it.