JavaScript Regular Expressions

JavaScript Regular Expressions

http://www.advanced-javascript-tutorial.com/RegularExpressions.cfm#Regular-Expression-Syntax

Regular Expressions are powerful pattern matching expressions. JavaScript developers will often use regular expressions for sanitizing user input and advanced form validation.

正则表达式是非常强大的表达式匹配工具。JS开发者会经常使用正则表达式来做输入和表格验证。

 

Lesson Goals

  • To use regular expressions for advanced form validation.
  • To use regular expressions and backreferences to clean up form entries.

课程目标:

  1. 使用正则表达式做验证。
  2. 使用正则和反向引用来清理表格输入项

     

     

Getting Started

Regular expressions are used to do sophisticated pattern matching, which can often be helpful in form validation. For example, a regular expression can be used to check whether an email address entered into a form field is syntactically correct. JavaScript supports Perl-compatible regular expressions.

正则用在复杂的模式匹配上,对表单验证而言很有帮助。例如,正则表达式可以用来验证email地址的有效性。JS支持兼容Perl的正则表达式。

 

There are two ways to create a regular expression in JavaScript:

有两种方式来创建JS正则表达式。

第一种,字面量方式

第二种,使用RegExp() ,创建obj

 

Assuming you know the regular expression pattern you are going to use, there is no real difference between the two; however, if you don't know the pattern ahead of time (e.g, you're retrieving it from a form), it can be easier to use the RegExp() constructor.

假设你知道你将要使用的正则表达式,那么这两种方式没有什么实际区别。然而,如果你提前不知道要正则的规则,你可以使用RegExp() 的方式。

 

JavaScript's Regular Expression Methods

JS的正则方法:

The regular expression method in JavaScript has two main methods for testing strings: test() and exec().

JS中有两个主要的方法:test()和exec()

 

The exec() Method

The exec() method takes one argument, a string, and checks whether that string contains one or more matches of the pattern specified(指定) by the regular expression. If one or more matches is found, the method returns a result array(数组) with the starting points of the matches. If no match is found, the method returns null.

Exec() 方法的传入一个string类型的参数。如果正则和字符串匹配则返回从匹配点开始的匹配数组。如果不匹配,则改方法返回null

 

The test() Method

The test() method also takes one argument, a string, and checks whether that string contains a match of the pattern specified by the regular expression. It returns true if it does contain a match and false if it does not. This method is very useful in form validation(确认) scripts. The code sample(样品) below shows how it can be used for checking a social security number. Don't worry about the syntax(语法) of the regular expression itself. We'll cover that shortly.

test()方法参数也为一个,改方法测试传入的参数是否和正则匹配,如果匹配,则返回true

在表单确认中该方法非常有效。下面的demo显示了它如何用来社会保险号(美)。不必担心正则的语法,后面我们将会讲到。

<!DOCTYPE HTML>

<html>

<head>

<meta charset="UTF-8">

<title>ssn Checker</title>

<script type="text/javascript">

var reSSN = /^[0-9]{3}[\- ]?[0-9]{2}[\- ]?[0-9]{4}$/;

 

function checkSsn(ssn){

    if (reSSN.test(ssn)) {

        alert("VALID SSN");

    } else {

        alert("INVALID SSN");

    }

}

</script>

</head>

<body>

    <form onsubmit="return false;">

        <input type="text" name="ssn" size="20">

        <input type="button" value="Check"

            onclick="checkSsn(this.form.ssn.value);">

    </form>

</body>

</html>

Let's examine the code more closely:

  1. First, a variable containing a regular expression object for a social security number is declared.
  2. Next, a function called checkSsn() is created. This function takes one argument: ssn, which is a string. The function then tests to see if the string matches the regular expression pattern by passing it to the regular expression object's test() method. If it does match, the function alerts "VALID SSN". Otherwise, it alerts "INVALID SSN".
  3. Last, A form in the body of the page provides a text field for inserting a social security number and a button that passes the user-entered social security number to the checkSsn() function.

 

Flags

Flags appearing after the end slash modify how a regular expression works.

  • The i flag makes a regular expression case insensitive. For example, /aeiou/i matches all lowercase and uppercase vowels.
  • The g flag specifies a global match, meaning that all matches of the specified pattern should be returned.

标记

标记出现在正斜杠之后,用来表示正则表达式如何发挥作用。

  1. i标记表示对大小写不敏感。例如,/aeiou/i表示了所有的大小写原音。
  2. g标记制定了全局的匹配,意味着所有的匹配项都会返回。

     

String Methods

There are several String methods that take regular expressions as arguments.

The search () Method

The search () method takes one argument: a regular expression. It returns the index of the first character of the substring(子串) matching the regular expression. If no match is found, the method returns -1.

 

"Webucator".search(/cat/); //returns 4

 

search()方法接受一个正则参数。它返回匹配子串的索引位置。如果没有匹配,则返回-1

 

The split() Method

The split() method takes one argument: a regular expression. It uses the regular expression as a delimiter(划界) to split the string into an array of strings.

 

split()方法也接受一个正则表达式参数,根据正则将字符串拆分成字符串数组。

The replace() Method

The replace() method takes two arguments: a regular expression and a string. It replaces the first regular expression match with the string. If the g flag is used in the regular expression, it replaces all matches with the string.

replace()方法接受两个参数:一个是正则表达式,一个是字符串。它用字符串匹配第一个找到的项。如果正则中给出g标记,它将替换所有的匹配项(而非第一个匹配)。

 

The match() Method

The match() method takes one argument: a regular expression. It returns each substring(子串) that matches the regular expression pattern.

match()方法接受一个正则参数。同时,其返回所有匹配的子串。

 

 

 

Regular Expression Syntax

A regular expression is a pattern that specifies a list of characters. In this section, we will look at how those characters are specified.

 

正则语法

    正则是一种包含许多字符串的模式。下面将解释这些字符串的具体作用。

 

Start and End ( ^ $ )

A caret (^) at the beginning of a regular expression indicates that the string being searched must start with this pattern.

  • The pattern ^foo can be found in "food", but not in "barfood".

A dollar sign ($) at the end of a regular expression indicates that the string being searched must end with this pattern.

  • The pattern foo$ can be found in "curfoo", but not in "food".

 

开始和结束(^ $

    ^表示被查找的串应该以某种模式方式开始。

例如,^foo能够在"food"中找到,但是在"barfood"中找不到。

 

$表示被查找串应该以某种模式结束。

例如,foo$能够在"curfoo"中找到,但是在"food"中找不到。

 

Number of Occurrences(发生) ( ? + * {} )

The following symbols affect the number of occurrences of the preceding(在前的) character (or characters if parenthesis(插入语) are used): ?, +, *, and {}.

? + * {}四种符号会影响之前的字符出现的次数(插入语也可以)。

 

A question mark (?) indicates that the preceding (在前的) character should appear zero or one times in the pattern.

The pattern foo? can be found in "food" and "fod", but not "faod".

?表示前面的字符应该出现一次或者0次。例如,foo? 和"food"和"fod"匹配,但是和"faod"不匹配。

 

A plus sign (+) indicates that the preceding character should appear one or more times in the pattern.

The pattern fo+ can be found in "fod", "food" and "foood", but not "fd".

A asterisk (*) indicates that the preceding character should appear zero or more times in the pattern.

    +表示之前的字符应该出现一次或者多次。 fo+和"fod","food"和"foood"匹配,但是和"fd"是不匹配的。

 

The pattern fo*d can be found in "fd", "fod" and "food".

*表示之前的字符应该出现0次或者多次,fo*d和"fd","fod"和"food"匹配

 

Curly(卷曲的) brackets(支架) with one parameter ( {n} ) indicate that the preceding(在前的) character should appear exactly n times in the pattern.

The pattern fo{3}d can be found in "foood" , but not "food" or "fooood".

Curly brackets with two parameters ( {n1,n2} ) indicate that the preceding character should appear between n1 andn2 times in the pattern.

    大括号表示前面字符应该出现的次数。fo{3}d和"foood"匹配,但是和"food"或者"fooood"却不匹配。

    

The pattern fo{2,4}d can be found in "food","foood" and "fooood", but not "fod" or "foooood".

fo{2,4}d和"food","foood" 和 "fooood"匹配,但是和"fod" 或者"foooood"不匹配。

 

Curly brackets with one parameter and an empty second paramenter ( {n,} ) indicate that the preceding(在前的)character should appear at least n times in the pattern.

The pattern fo{2,}d can be found in "food" and "foooood", but not "fod".

大括号中第一个参数n不空,第二个参数为空的情况下表示前面的字符出现最少n次。fo{2,}d和"food"以及"foooood"匹配,但是和"fod"不匹配。

 

Common Characters ( . \d \D \w \W \s \S )

A period ( . ) represents any character except a newline.

The pattern fo.d can be found in "food", "foad", "fo9d", and "fo*d".

.表示除了换行的字符,例如,fo.d和"food", "foad", "fo9d", 以及 "fo*d"匹配。

 

Backslash-d ( \d ) represents any digit. It is the equivalent of [0-9].

The pattern fo\dd can be found in "fo1d", "fo4d" and "fo0d", but not in "food" or "fodd".

\d代表任意数字,它等价于[0-9]。fo\dd和"fo1d", "fo4d" 以及 "fo0d"匹配。但是和"food" 或者 "fodd"不匹配。

 

Backslash-D ( \D ) represents any character except a digit. It is the equivalent of [^0-9].

The pattern fo\Dd can be found in "food" and "foad", but not in "fo4d".

\D代表了除了数字以外的字符。它和[^0-9]等价。fo\Dd和"food" 以及 "foad"匹配。但是和"fo4d"不匹配。

 

 

Backslash-w ( \w ) represents any word character (letters, digits, and the underscore (_) ).

The pattern fo\wd can be found in "food", "fo_d" and "fo4d", but not in "fo*d".

\w代表的是字母、数字和下划线。fo\wd和"food", "fo_d" 以及 "fo4d"匹配,但是和"fo*d"不匹配。

 

Backslash-W ( \W ) represents any character except a word character.

The pattern fo\Wd can be found in "fo*d", "fo@d" and "fo.d", but not in "food".

\W代表的是非字母、数字和下划线,fo\Wd和"fo*d", "fo@d" 以及"fo.d"匹配。但是和"food"不匹配。

 

Backslash-s ( \s) represents any whitespace character (e.g, space, tab, newline, etc.).

The pattern fo\sd can be found in "fo d", but not in "food".

\s和任意空白字符匹配,例如空格(单个)、tab(单个)、newline(貌似不是换行)。 例如fo\sd和"fo d"匹配,但是和"food"不匹配。

 

Backslash-S ( \S ) represents any character except a whitespace character.

The pattern fo\Sd can be found in "fo*d", "food" and "fo4d", but not in "fo d".

\S代表所有的非空白字符. fo\Sd和"fo*d", "food"以及"fo4d"匹配,但是和"fo d"不匹配。

 

Grouping ( [] )

Square brackets ( [] ) are used to group options.

  • The pattern f[aeiou]d can be found in "fad" and "fed", but not in "food", "faed" or "fd".
  • The pattern f[aeiou]{2}d can be found in "faed" and "feod(封地)", but not in "fod", "fed" or "fd".
  • The pattern [A-Za-z]+ can be found in "Webucator, Inc.", but not in "13078".

中括号用来分组(分组,不代表是多个,不含有数量指示)。例如:

  1. f[aeiou]d"fad" 以及"fed"匹配,但是和"faed" 以及"fd"不匹配。
  2. f[aeiou]{2}d"faed" 以及"feod(封地)"匹配。但是和"fod", "fed" 以及 "fd"不匹配。
  3. [A-Za-z]+"WebucatorInc"匹配(原文有误),但是和"13078"不匹配。

 

注意:分组不代表数量,数量有单独的标识。

 

Negation(否定) ( ^ )

When used as the first character within square brackets(支架), the caret(脱字符号) ( ^ ) is used for negation.

  • The pattern f[^aeiou]d can be found in "fqd" and "f4d", but not in "fad" or "fed".

在中括号中使用^和不在中括号中使用的意义不同。在中括号中代表的是否定。

    例如,f[^aeiou]d和"fqd" 以及 "f4d"匹配。但是和"fad"和"fed"不匹配。

 

Subpatterns ( () )

Parentheses(括号) ( () ) are used to capture subpatterns.

  • The pattern f(oo)?d can be found in "food" and "fd", but not in "fod".

双重括号代表的是子模式。例如, f(oo)?d和"food"以及"fd"匹配,但是和"fod"并不匹配。

 

Alternatives ( | )

The pipe ( | ) is used to create optional patterns.

  • The pattern foo$|^bar can be found in "foo" and "bar", but not "foobar".

|代表的是可选项,例如foo$|^bar和"foo"和"bar"匹配,但是和"foobar"不匹配。

 

Escape Character ( \ )

The backslash ( \ ) is used to escape special characters.

  • The pattern fo\.d can be found in "fo.d", but not in "food" or "fo4d".

\用来转义字符。例如,fo\.d和"fo.d"匹配。但是和"food"和"fo4d"不匹配。

问:哪些字符需要转移?o(∩_∩)o

 

Backreferences

Back references are special wildcards(通配符) that refer back to a sub pattern within a pattern. They can be used to make sure that two sub patterns match. The first sub pattern in a pattern is referenced as \1, the second is referenced as \2, and so on.

For example, the pattern ([bmpw])o\1 matches "bob", "mom", "pop", and "wow", but not "bop" or "pow".

 

反向引用主要是通过使用通配符的方式来指代某一个子模式(小括号中的内容)。\1表示第一个匹配(即第一个小括号内匹配的内容)。第二个则指第二个匹配。以此类推。例如,([bmpw])o\1和"bob"(b为\1), "mom", "pop", and "wow"匹配,但是和"bop" or "pow"不匹配。

关于反向引用,有如下解释(不再翻译):

Using Backreferences in The Regular Expression

Backreferences can not only be used after a match has been found, but also during the match. Suppose you want to match a pair of opening and closing HTML tags, and the text in between. By putting the opening tag into a backreference, we can reuse the name of the tag for the closing tag. Here's how: <([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>. This regex contains only one pair of parentheses, which capture the string matched by [A-Z][A-Z0-9]* into the first backreference. This backreference is reused with \1 (backslash one). The / before it is simply the forward slash in the closing HTML tag that we are trying to match.

To figure out the number of a particular backreference, scan the regular expression from left to right and count the opening round brackets. The first bracket starts backreference number one, the second number two, etc. Non-capturing parentheses are not counted. This fact means that non-capturing parentheses have another benefit: you can insert them into a regular expression without changing the numbers assigned to the backreferences. This can be very useful when modifying a complex regular expression.

You can reuse the same backreference more than once. ([a-c])x\1x\1 will match axaxa, bxbxb and cxcxc. If a backreference was not used in a particular match attempt (such as in the first example where the question mark made the first backreference optional), it is simply empty. Using an empty backreference in the regex is perfectly fine. It will simply be replaced with nothingness.

 

 

A more practical example is matching the delimiter(划界) in social security numbers. Examine the following regular expression:

反向引用的一个更加实用的例子是划界。如下例:

var pattern = /^\d{3}([\- ]?)\d{2}([\- ]?)\d{4}$/;

 

Within the caret(脱字符号) (^) and dollar sign ($), which are used to specify the beginning and end of the pattern, there are three sequences of digits, optionally separated by a hyphen(以连字号连接) or a space. This pattern will be matched in all of following strings (and more):

  • 123-45-6789
  • 123 45 6789
  • 123456789
  • 123-45 6789
  • 123 45-6789
  • 123-456789

The last three strings are not ideal, but they do match the pattern. Backreferences can be used to make sure that the second delimiter matches the first delimiter. The regular expression would look like this:

最后三个串的匹配并不理想,此时,可以使用反向引用,这时反向引用可以保证第一个和第二个是一致的,修改后的正则表达式如下:

    ^\d{3}([\- ]?)\d{2}\1\d{4}$

    The \1 refers back to the first sub pattern. Only the first three strings listed above match this regular expression.

    反向引用\1指代第一个匹配的子模式,因此只有以下三项符合要求:

  • 123-45-6789
  • 123 45 6789
  • 123456789
posted @ 2012-11-24 23:56  jiangC  阅读(1180)  评论(0编辑  收藏  举报