This page is where I'm going to keep any regexes I've written that I think might come in handy. It's basically here because I wrote a regular expression to check for valid IPv4 addresses, and I thought it might be handy to people
This morning I needed a regex to use with an ASP.NET
RegularExpressionValidator. I wanted to check that an IP (v4) address was
of a valid format. After having googled, I still hadn't found anything
that seemed great and was explained particularly well, so I thought I'd
write my own, and attempt to put a bit of an explanation up too. Turns
out that I hadn't seen anything particularly elegant because it's
actually not a simple regex. You can easily check for the 1-3 digits,
with dots separating (with this ^(\d{1,3}.){3}\d{1,3}$ The
problem is, checking that each octet is 255 or below. One approach I saw
was to merely check for the format (as the regex I've just given does),
and then have your application validate the numeric value of each octet.
This would be much easier - as you could simply split the string on .
characters, parse each octet to integers, and check their value. It's not
as neat as doing it all in the regex though, and it'd mean a custom
validator or some other code to be workable in ASP.NET. So I came up with
this beast:
^(((0?0?\d)|((0|1)?\d\d)|(2[0-4]\d)|(25[0-5]))\.){3}((0?0?\d)|((0|1)?\d\d)|(2[0-4]\d)|(25[0-5]))$
This regex matches a valid IP address, with no surrounding characters. As far as I know, it all works correctly, however, there's bound to be problems, so if anybody spots anything, then please drop me a comment and let me know. Likewise, it can probably be trimmed down, which is something I'm going to do, but if anybody would care to comment, they're welcome to.
I'm going to explain what each bit does, since it's pretty illegible as a full string (plus it'll help me remember what it does in the future ;) ) Note that I'm going to assume you already understand Regex syntax - I'm just explaining the sections of my expression.
Firstly, the ^ and $ characters are used at
either end of the expression to ensure that there are no unwanted
characters either side of the IP.
Now, we'll break it down a bit more. The
((0?0?\d)|((0|1)?\d\d)|(2[0-4][0-9])|(25[0-5]))\.)
represents a single octet of an IP (with the trailing dot). The
{3} immediately after it specifies that it must be repeated
3 times exactly. This is then followed by
((0?0?\d)|((0|1)?\d\d)|(2[0-4][0-9])|(25[0-5])), which is
the same thing, but without a trailing dot. This specifes that after the
1st 3 octets (and dots), the last Octet must exist, and not have a
following dot (since the last octet is immediately follow by
$, meaning end of string).
So, that's the structure of the IP checked. Let's now look at the part
which checks the individual octets. The code is
((0?0?\d)|((0|1)?\d\d)|(2[0-4][0-9])|(25[0-5])). This
section basically checks for the four different structures that an octet
can have (as it must be a number between 0 and 255), by specifying each
case, separated by | (OR) symbols.
The first case is 0?0?\d. This case says the octet must
contain a single digit (the \d), and optionally one or two
zeroes (0? specifies an optional zero). This allows octets
like 1, 03 or 003.
The next case is (0|1)?\d\d. This one matches any octets
which are two digits, with an optional leading 0 or 1. I know, the
leading digit of an octet can be 2 as well, but then you can't have any
two digits following it, so we'll deal with that situation in the next
case. This octet will match things like 12, 012 and 112.
So, the penultimate case, 2[0-4][0-9]. This matches any
octets which start with a leading 2, and then have any number between 0
(or 00), and 49. This is accomplished by checking that the second digit
is between 0 and 4, and the last digit is between 0 and 9.
Finally, the last case. This uses the following expression:
25[0-5]. This case checks for any octets of the form 25X,
where X is a number between zero and five. This ensures that octets can
only be from 250 - 255, and no greater.
So, I hope this has helped anyone reading this to understand how my regex works. It is by no means the most efficient expression around, but I hope you find it helpful. If anyone finds any problems with it (IPs missed, ways it could be more efficient/concise etc.), then please let me know. I'd be very appreciative.
Update: I've realised that the 2[0-4][0-9] section of the
regex could be replaced by 2[0-4]\d, since \d
is basically just a shorthand for [0-9]. This makes the new
regex the following (with the updated sections highlighted):
^(((0?0?\d)|((0|1)?\d\d)|(2[0-4]\d)|(25[0-5]))\.){3}((0?0?\d)|((0|1)?\d\d)|(2[0-4]\d)|(25[0-5]))$
Another Update: The expression was matching IP Strings with missing
last octets. This seems to have been because the dot separator was
specified as . (which matches any character) instead of escaped (as
\.). I imagine this was matching the first digit of the
first octet as the entire first octet, then the next digit matched the
., and so on. The regex has now been updated, and all seems
ok. The entire string in the introduction above has been updated too.