Regular Expression
It's a very elegant summary of regular expression from The AWK Programming Language.
1. The regular expression metacharacters are:
\ ^ $ . [ ] | ( ) * + ?
2. A basic regular expression is one of the following:
- a nonmetacharacter, such as A, that matches itself.
- an escape sequence that matches a special symbol: \t matches a tab.
- a quoted metacharacter, such as \*, that matches the metaqcharacter literally.
- ^, which matches the beginning of a string.
- $, which matches the end of a string.
- ., which matches any single character.
- a character class: [ABC] matches any of the characters A, B, or C. Character classes may include abbreviations: [A-Za-z] matches any single letter.
- a complemented character class: [^0-9] matches any character except a digit.
3. These operators combine regular expressions into larger ones:
- alternation: A | B matches A or B.
- concatenation: AB matches A immediately followed by B.
- closure: A* matches zero or more A's.
- positive closure: A+ matches one or more A's.
- zero or one: A? matches the null string or A.
- parentheses: (r) matches the same strings as r does.
Expression | Matches |
c | the nonmetacharacter c |
\c | escape sequence or literal character c |
^ | beginning of string |
$ | end of string |
. | any character |
[$c_1$$c_2$...] | any character in $c_1$$c_2$ |
[^$c_1$$c_2$...] | any character not in $c_1$$c_2$ |
[$c_1$-$c_2$] | any character in the range beginning with $c_1$ and ending with $c_2$ |
[^$c_1$-$c_2$] | any character not in the range $c_1$ to $c_2$ |
$r_1$|$r_2$ | any string matched by $r_1$ or $r_2$ |
($r_1$)($r_2$) | any string xy where $r_1$ matches x and $r_2$ matches y; parentheses not needed around arguments with no alternations |
(r)* | zero or more consecutive strings matched by r |
(r)+ | one or more consecutive strings matched by r |
(r)? | zero or one string matched by r parentheses not needed around basic regular expressions |
(r) | any string matched by r |