Regular Expressions - Example 1

The following regular expression matches most E-mail addresses:

^[_a-zA-Z0-9]+(\.[_a-zA-Z0-9]+)*@[a-zA-Z0-9]+(\.[a-zA-Z0-9]+)*(\.[a-zA-Z]{2,4})$

Let's break that apart:

  • The ^ at the start means that the pattern must be at the start of the string, i.e. can't be preceded by anything (not even spaces), and the $ at the end means that the pattern must be at the end of the string, i.e. not followed by any characters, not even spaces. This means that the E-mail address can't be embedded within anything else.

  • a-zA-Z0-9 matches any upper or lower case letter or any digit. Similarly, the [a-zA-Z]{2,4] at the end matches any string of 2 to 4 upper or lower case letters (no more, no fewer).

  • The [ ] mean "one or more of these", so [_a-zA-Z0-9] means "one or more upper or lower letter, digit or underscore here".

  • The \. means "a dot". The dot has to be preceded by \ as . is a special character in regular expressions.

  • The ( )* means "zero or more of these here", and [ ]+ means "one or more of these here", so (\.[_a-zA-Z0-9-]+)* means "a dot followed by one or more letter/digit/underscore, but the whole thing is optional", to match strings like hotmail.co.uk which has an extra section over something like hotmail.com.

Phew! By considering a few legal E-mail addresses with different structures, you should be able to match the different parts of the E-mail address to the different parts of the regular expression. Use the following field to test a few E-mail addresses using the regular expression above:


Can you find any legal E-mail addresses that the regular expression does not match? How would you adapt the regular expression so that it recognised those addresses as well?