regex
This page provides an overall cheat sheet of all the capabilities of RegExp
syntax by aggregating the content of the articles in the RegExp
guide. If you need more information on a specific topic, please follow the link on the corresponding heading to access the full article or head to the guide.
Character classes
Characters | Meaning |
---|---|
. |
Has one of the following meanings:
Note that the ES2018 added the |
\d |
Matches any digit (Arabic numeral). Equivalent to |
\D |
Matches any character that is not a digit (Arabic numeral). Equivalent to |
\w |
Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to |
\W |
Matches any character that is not a word character from the basic Latin alphabet. Equivalent to |
\s |
Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to |
\S |
Matches a single character other than white space. Equivalent to |
\t |
Matches a horizontal tab. |
\r |
Matches a carriage return. |
\n |
Matches a linefeed. |
\v |
Matches a vertical tab. |
\f |
Matches a form-feed. |
[\b] |
Matches a backspace. If you're looking for the word-boundary character (\b ), see Boundaries. |
\0 |
Matches a NUL character. Do not follow this with another digit. |
\cX |
Matches a control character using caret notation, where "X" is a letter from A–Z (corresponding to codepoints |
\xhh |
Matches the character with the code hh (two hexadecimal digits). |
\uhhhh |
Matches a UTF-16 code-unit with the value hhhh (four hexadecimal digits). |
\u{hhhh} or \u{hhhhh} |
(Only when the u flag is set.) Matches the character with the Unicode value U+hhhh or U+hhhhh (hexadecimal digits). |
\ |
Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.
Note that some characters like To match this character literally, escape it with itself. In other words to search for |
Assertions
Boundary-type assertions
Characters | Meaning |
---|---|
^ |
Matches the beginning of input. If the multiline flag is set to true, also matches immediately after a line break character. For example, This character has a different meaning when it appears at the start of a group. |
$ |
Matches the end of input. If the multiline flag is set to true, also matches immediately before a line break character. For example, |
\b |
Matches a word boundary. This is the position where a word character is not followed or preceded by another word-character, such as between a letter and a space. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero. Examples:
To match a backspace character ( |
\B |
Matches a non-word boundary. This is a position where the previous and next character are of the same type: Either both must be words, or both must be non-words, for example between two letters or between two spaces. The beginning and end of a string are considered non-words. Same as the matched word boundary, the matched non-word boundary is also not included in the match. For example, |
Other assertions
Note: The ?
character may also be used as a quantifier.
Characters | Meaning |
---|---|
x(?=y) |
Lookahead assertion: Matches "x" only if "x" is followed by "y". For example, / |
x(?!y) |
Negative lookahead assertion: Matches "x" only if "x" is not followed by "y". For example, |
(?<=y)x |
Lookbehind assertion: Matches "x" only if "x" is preceded by "y". For example, |
(?<!y)x |
Negative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, |
Groups and ranges
Characters | Meaning |
---|---|
x|y |
Matches either "x" or "y". For example, |
[xyz] |
A character set. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets it is taken as a literal hyphen to be included in the character set as a normal character. It is also possible to include a character class in a character set. For example, For example, For example, |
|
A negated or complemented character set. That is, it matches
anything that is not enclosed in the brackets. You can specify a range
of characters by using a hyphen, but if the hyphen appears as the first
or last character enclosed in the square brackets it is taken as a
literal hyphen to be included in the character set as a normal
character. For example, The ^ character may also indicate the beginning of input. |
(x) |
Capturing group: Matches A regular expression may have multiple capturing groups. In
results, matches to capturing groups typically in an array whose members
are in the same order as the left parentheses in the capturing group.
This is usually just the order of the capturing groups themselves. This
becomes important when capturing groups are nested. Matches are accessed
using the index of the the result's elements ( Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).
|
\n |
Where "n" is a positive integer. A back reference to the last
substring matching the n parenthetical in the regular expression
(counting left parentheses). For example, |
\k<Name> |
A back reference to the last substring matching the Named capture group specified by For example,
|
(?<Name>x) |
Named capturing group: Matches "x" and stores it on the groups property of the returned matches under the name specified by For example, to extract the United States area code from a phone number, we could use |
(?:x) |
Non-capturing group: Matches "x" but does not remember the match. The matched substring cannot be recalled from the resulting array's elements ([1], ..., [n] ) or from the predefined RegExp object's properties ($1, ..., $9 ). |
Quantifiers
Note: In the following, item refers not only to singular characters, but also includes character classes, Unicode property escapes, groups and ranges.
Characters | Meaning |
---|---|
x* |
Matches the preceding item "x" 0 or more times. For example, |
x+ |
Matches the preceding item "x" 1 or more times. Equivalent to |
x? |
Matches the preceding item "x" 0 or 1 times. For example, If used immediately after any of the quantifiers |
x{n} |
Where "n" is a positive integer, matches exactly "n" occurrences of the preceding item "x". For example, |
x{n,} |
Where "n" is a positive integer, matches at least "n" occurrences of the preceding item "x". For example, |
x{n,m} |
Where "n" is 0 or a positive integer, "m" is a positive integer, and |
|
By default quantifiers like
|
Unicode property escapes
// Non-binary values \p{UnicodePropertyValue} \p{UnicodePropertyName=UnicodePropertyValue} // Binary and non-binary values \p{UnicodeBinaryPropertyName} // Negation: \P is negated \p \P{UnicodePropertyValue} \P{UnicodeBinaryPropertyName}
- UnicodeBinaryPropertyName
- The name of a binary property. E.g.:
ASCII
,Alpha
,Math
,Diacritic
,Emoji
,Hex_Digit
,Math
,White_space
, etc. See Unicode Data PropList.txt for more info. - UnicodePropertyName
- The name of a non-binary property:
- General_Category (
gc
) - Script (
sc
) - Script_Extensions (
scx
)
See also PropertyValueAliases.txt
- UnicodePropertyValue
- One of the tokens listed in the Values section, below. Many values have aliases or shorthand (e.g. the value
Decimal_Number
for theGeneral_Category
property may be writtenNd
,digit
, orDecimal_Number
). For most values, theUnicodePropertyName
part and equals sign may be omitted. If aUnicodePropertyName
is specified, the value must correspond to the property type given.
Note: As there are many properties and values available, we will not describe them exhaustively here but rather provide various examples
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet