前端学习-正则表达式

什么是正则表达式

要点：

用于匹配字符串中字符组合的模式
js中，正则表达式也是对象
用于 RegExp 的 exec 和 test 方法，以及 String 的 match、matchAll、replace、search 和 split 方法

g i m 分别是什么意思

/i (忽略大小写)
/g (全文查找出现的所有匹配字符)
/m (多行查找)
/gi(全文查找、忽略大小写)
/ig(全文查找、忽略大小写)
/d(生成子串匹配的索引)
/s(允许 . 匹配换行符)
/u(“Unicode”；将模式视为 Unicode 码位序列）
/v(升级 u 模式，提供更多 Unicode 码特性)
/y(执行“粘性（sticky）”搜索，从目标字符串的当前位置开始匹配）

方法

RegExp.prototype.exec()

在一个指定字符串中执行一个搜索匹配。返回一个结果数组或 null
要点1：

当regex设置了全局匹配/g时，regex.exec(str) 每次匹配得到一个字符串，并且将正则表达式的lastIndex置为匹配到的字符串对应的索引位置，第二次执行regex.exec(str)，会从正则表达式的lastIndex开始匹配，如下例
如果匹配失败，exec() 方法返回 null，并将正则表达式的 lastIndex 重置为 0
注意，即使再次查找的字符串不是原查找字符串时，lastIndex 也不会被重置，它依旧会从记录的 lastIndex 开始

<script>
    const regex = /foo*/g;
    const str = 'table fooootball, fosball';

    let myArray = [];
    while ((myArray = regex.exec(str)) !== null) {
        let msg = `Found ${myArray[0]}. `;
        msg += `Next match starts at ${regex.lastIndex}`;
        console.log(msg);
    }
    // Found foooo. Next match starts at 11
    // Found fo. Next match starts at 20
</script>

虽然 exec() 本身非常强大而又有效，但它通常不能最清楚地表示调用的目的。因此在不同场景下，可使用test(),match(),matchAll(),search()方法代替

要点2：

如果匹配成功，exec() 方法返回一个数组，并更新正则表达式对象的 lastIndex 属性
完全匹配成功的文本将作为返回数组的第一项，从第二项起，后续每项都对应一个匹配的捕获组
如下例：第一个Brown是匹配到的文本，第二个Brown是一个匹配的捕获组

const re4 = /(?<color>brown)/gi;
console.log(re4.exec("The Quick Brown Fox Jumps Over The Lazy Dog")); // [ "Brown", "Brown" ]

简单模式

/abc/可以匹配到字符串中第一个"abc"
如下例

console.log(('abc ssijah aaaabcc sad').replace(/abc/,'hhh')); //结果：hhh ssijah aaaabcc sad

使用特殊字符

断言（Assertions）

断言详细知识点
解析示例

<script>
        let orangeNotLemon = "Do you want to have an orange? Yes, I do not want to have a lemon!";
                                                                                                                                    
        let selectNotLemonRegex1 = /[^?!]/gi; // 范围-匹配除了'?','!'以外的所有字符
        let selectNotLemonRegex2 = /have(?! a lemon)/gi; // 先行否定断言-匹配后面没有' a lemon'的'have'
        let selectNotLemonRegex3 = /have(?! a lemon)[^?!]/gi; // selectNotLemonRegex1中跟随'have'仅有字符' '，因此匹配到'have '
        let selectNotLemonRegex4 = /[?!]/gi; // 范围-匹配'?','!'
        let selectNotLemonRegex5 = /[^?!]+have(?! a lemon)[^?!]/gi; // +指多次匹配前面括号内的表达式（此处为[^?!]），因此[^?!]+匹配到'have '前的所有字符'Do you want to '，返回'Do you want to have '
        let selectNotLemonRegex6 = /[^?!]+have(?! a lemon)[^?!]+[?!]/gi; // have(?! a lemon)[^?!]匹配到'have an orange'，

        console.log(orangeNotLemon.match(selectNotLemonRegex1)); // [ 'D', 'o', ' ', 'y', 'o', 'u', ' ', 'w', 'a', 'n', 't', ' ', 't', 'o', ' ', 'h', 'a', 'v', 'e', ' ', 'a', 'n', ' ', 'o', 'r', 'a', 'n', 'g', 'e', ' ', 'Y', 'e', 's', ',', ' ', 'I', ' ', 'd', 'o', ' ', 'n', 'o', 't', ' ', 'w', 'a', 'n', 't', ' ', 't', 'o', ' ', 'h', 'a', 'v', 'e', ' ', 'a', ' ', 'l', 'e', 'm', 'o', 'n']
        console.log(orangeNotLemon.match(selectNotLemonRegex2)); // [ 'have' ]
        console.log(orangeNotLemon.match(selectNotLemonRegex3)); // [ 'have ' ]
        console.log(orangeNotLemon.match(selectNotLemonRegex4)); // [ '?', '!' ]
        console.log(orangeNotLemon.match(selectNotLemonRegex5)); // [ 'Do you want to have ' ]
        console.log(orangeNotLemon.match(selectNotLemonRegex6)); // [ 'Do you want to have an orange?' ]
    </script>

字符类（Character Classes）

字符类详细知识点

n*

匹配前一个表达式 0 次或多次。等价于 {0,}。
例如，/bo*/ 会匹配 "A ghost boooooed" 中的 'booooo' 和 "A bird warbled" 中的 'b'，但是在 "A goat grunted" 中不会匹配任何内容。

示例

<script>
    const aliceExcerpt = "I'm sure I'm not Ada,' she said, 'for her hair goes in such long ringlets, and mine doesn't go in ringlets at all.";
    const regexpWordStartingWithA = /\b[aA]\w+/g;
    // \b 表示边界（即不要在单词中间开始匹配）
    // [aA] 表示字母 a 或 A
    // \w+ 表示任何*拉丁字母*字符，多次

    console.table(aliceExcerpt.match(regexpWordStartingWithA));
    // ['Ada', 'and', 'at', 'all']

</script>

组和范围（Groups and ranges）

组和范围详细知识点

捕获组(x)

捕获组：匹配 x 并记住匹配项。例如，/(foo)/匹配并记住“foo bar”中的“foo”
会有性能损失
非捕获括号：String.match() String.matchAll()

示例(使用捕获组）：

<script>
    let personList = `First_Name: John, Last_Name: Doe
    First_Name: Jane, Last_Name: Smith`;

    let regexpNames = /First_Name: (\w+), Last_Name: (\w+)/gm;
    let match = regexpNames.exec(personList);

    // console.log(personList.match(regexpNames));
    // console.log(match); // ['First_Name: John, Last_Name: Doe', 'John', 'Doe']

    do {
        console.log(`Hello ${match[1]} ${match[2]}`);
    } while ((match = regexpNames.exec(personList)) !== null);
</script>

具名捕获组(?x)

匹配"x"并将其存储在返回的匹配项的 groups 属性中，该属性位于指定的名称下。尖括号 (< 和 >) 用于组名。

<script>
    let users = `姓氏：李，名字：雷
    姓氏：韩，名字：梅梅`;

    let regexpNames = /姓氏：(?<first>.+)，名字：(?<last>.+)/gm;
    let match = regexpNames.exec(users);

    do {
    console.log(`Hello ${match.groups.first} ${match.groups.last}`);
    } while ((match = regexpNames.exec(users)) !== null);

    // Hellow 李 雷
    // Hellow 韩 梅梅

</script>

量词（Quantifiers）

贪婪非贪婪

    <script>
        let text = "I must be getting somewhere near the centre of the earth.";
        // []是范围 [\w ]是匹配字母或空格 [\w ]+匹配所有字母或空格
        let greedyRegexp = /[\w ]+/;
        // [\w ]      a letter of the latin alphabet or a whitespace
        //      +     one or several times

        console.log(text.match(greedyRegexp));
        // "I must be getting somewhere near the centre of the earth"

        let nonGreedyRegexp = /[\w ]+?/; // Notice the question mark
        console.log(text.match(nonGreedyRegexp));
        // "I"

    </script>

设置匹配几次

<script>
    var singleLetterWord = /\b\w\b/g;
    var notSoLongWord = /\b\w{1,6}\b/g; // 最少匹配1个字符，最多6个
    var loooongWord = /\b\w{13,}\b/g; //  至少匹配13个字符

    var sentence = "Why do I have to learn multiplication table?";

    console.table(sentence.match(singleLetterWord)); // ["I"]
    console.table(sentence.match(notSoLongWord)); // [ "Why", "do", "I", "have", "to", "learn", "table" ]
    console.table(sentence.match(loooongWord)); // ["multiplication"] 可选可选字符

</script>

Unicode 属性转义（Unicode Property Escapes）

一般类别

<script>
    // finding all the letters of a text
    let story = "It's good！";

    // Most explicit form
    // story.match(/\p{General_Category=Letter}/gu);
    console.log(story.match(/\p{General_Category=Letter}/gu)); // ['I', 't', 's', 'g', 'o', 'o', 'd']

    // It is not mandatory to use the property name for General categories
    // story.match(/\p{Letter}/gu);
    console.log(story.match(/\p{Letter}/gu)); // ['I', 't', 's', 'g', 'o', 'o', 'd']

    // This is equivalent (short alias):
    // story.match(/\p{L}/gu);
    console.log(story.match(/\p{L}/gu)); // ['I', 't', 's', 'g', 'o', 'o', 'd']

    // This is also equivalent (conjunction of all the subcategories using short aliases)
    // story.match(/\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}/gu);
    console.log(story.match(/\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}/gu)); // ['I', 't', 's', 'g', 'o', 'o', 'd']

</script>

文字

···

Unicode 属性转义 vs. 字符类

字符类尤其是 \w 或 \d 匹配字母或数字，仅能匹配拉丁文字的字符 (换言之，a 到 z、 A 到 Z 的 \w 和 0 到 9 的 \d)
但Unicode 属性转义包含更多字符，\p{Letter} 或 \p{Number} 将会适用于任何文字

<script>
    // Trying to use ranges to avoid \w limitations:

    const nonEnglishText = "Приключения Алисы в Стране чудес";
    const regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
    // BMP goes through U+0000 to U+FFFF but space is U+0020

    console.table(nonEnglishText.match(regexpBMPWord));

    // Using Unicode property escapes instead
    const regexpUPE = /\p{L}+/gu;
    console.table(nonEnglishText.match(regexpUPE));

</script>

posted @ 2024-01-23 22:21 ayubene 阅读(83) 评论(0) 收藏举报

刷新页面返回顶部

ayubene