I will start off by saying I am absolutely, positively, by no means at all a guru, or even of intermediate capabilities when it comes to regular expressions. What I do know is this, regular expressions are an incredibly powerful, and relatively simple (depending on your definition of the word) way of validating strings of data for basic formatting and contents.
However, it's one of those things that I don't use enough to get very fluent with, so it winds up to be something I have to relearn every time I need to use it. With that said, over the years I have acquired a little bit of knowledge, and a decent little library of some regular expressions I use on a regular basis.
For a deeper understanding of regular expressions, otherwise known as RegEx, a great resource is regular-expressions.info, it's where I go for refreshers when I need to figure out something new. You can, through this little library of samples I will give you, learn a basic understanding of the language at a very high level, and I will explain them a little bit as I go.
Basics
For starters, ^ and $ are "beginning" and "end". it indicates what the string starts and ends with to match the regular expression test. What is inside of the { and } is the range of the number of characters allowed in the preceding chunk of code.
Validating Integers
^\d{1,10}$The first item after the start "^" is \d, that represents a digit. That is followed by {1,10} which means from 1 to 10 characters are allowed, and all must be an integer. It ends with the $. That tells the RegEx that between the beginning and end can only be from 1 to 10 integers. Anything else, and the regular expression match test fails.Word Characters
^[\w-\.]{1,100}$Again, with the ^ and $ this RegEx indicates it is testing on the entire string. I use this one for text based querystring validation. The [ and ] means that all the rules within those brackets apply. Within those brackets is a \w, which means "word characters" which means numbers, letters, _. I also add the - and . however the period means something else in RegEx, so, like with many languages it must be escaped, so it's \. to illustrate the period. The net result is allowing numbers, letters, _, - and period to pass the RegEx test. It allows between 1 and 100 of those allowed characters. (Long for a querystring, I know, just put there for examples sake).Dates
^(\d{1,2})(\/|-)(\d{1,2})(\/|-)(\d{2}|\d{4})$This will validate mm/dd/yyyy or mm-dd-yyyy, it will not see if it is an actual date, just follow the formatting option. Each chunk of code to be examined is in parenthesis, and matching from beginning to end. The first piece, (\d{1,2}), allows from one to 2 digits, the next, (\/|-), allow either / or - (the / is escaped by using \/), then another 1 to 2 digit check, then another / or - check, then a 2 or 4 digit check. Pretty simple, but cool example of RegEx.^[\w-\.]+@([\w-]+\.)+[\w-]{2,6}$There are MUCH more complicated email validation expressions out there, but this one is simple and easy to explain. Looking at it in pieces it makes sense. Before the @ it accepts word characters with the \w as well as a - and period, then an @ sign, then more word characters, then a period followed by 2 to 6 characters as the top level domain extension on the end.These are simple RegEx to give you an idea of how they work, they are simple but work great for some validation of input and format.


0 comments:
Post a Comment