十六、正则表达式

定义

相当于定义过滤模板，通过此规则过滤文本数据。

类型

正则表达式是通过正则表达式引擎实现的。

正则表达式引擎是底层软件，负责解释所定义的正则表达式并进行文本过滤。

Linux中正则表达式引擎

POSIX基础正则表达式（BRE）引擎
POSIX扩展正则表达式（ERE）引擎

正则表达式匹配规则

区分大小写
只要定义的文本出现在数据流中就能匹配
可以使用空格跟数字

例：只要有匹配的字符就能匹配，无需整个单词

[root@tzPC 20Unit]# echo "The books are expensive" | sed -n '/book/p'
The books are expensive

例：使用空格

[root@tzPC 20Unit]# echo "This is line number 1" | sed -n '/ber 1/p'
This is line number 1

例：匹配多个空格

[root@tzPC 20Unit]# cat data1
This is a normal line of text.
This is    a line with too may spaces.
[root@tzPC 20Unit]# sed -n '/   /p' data1
This is    a line with too may spaces.

特殊字符

包括. * [ ] ^ $ { } \ / + ? | ( ) 等

在正则表达式中使用需要使用反斜线\转义

例：$需要转义

[root@tzPC 20Unit]# cat data2
The cost is $4.00
[root@tzPC 20Unit]# sed -n '/\$/p' data2
The cost is $4.00

例：\需要转义

[root@tzPC 20Unit]# echo "\ is a special character" | sed -n '/\\/p'
\ is a special character

例：/需要转义

[root@tzPC 20Unit]# echo "3 / 2" | sed -n '/\//p'
3 / 2

锚字符

使用脱字符^锁定行首
使用美元符$锁定行尾

锁定行首^

脱字符定义从由换行符决定的数据流行首开始匹配

[root@tzPC 20Unit]# echo "The book store" | sed -n '/^book/p'
[root@tzPC 20Unit]# echo "Books are great" | sed -n '/^Book/p'
Books are great

必须放在模式开头，放在其他位置，sed编辑器会当成普通字符匹配

[root@tzPC 20Unit]# echo "This ^ is a test" | sed -n '/s ^/p'
This ^ is a test

书中说明里说脱字符需要转义，这点有误P429

[root@tzPC 20Unit]# cat data2 
The cost is $4.00
$4 is my money
[root@tzPC 20Unit]# sed -n '/^\$/p' data2
$4 is my money
[root@tzPC 20Unit]# sed -n '/\^\$/p' data2
[root@tzPC 20Unit]#

锁定行尾$

将$放在文本模式后表示数据行必须以该文本模式结尾。

[root@tzPC 20Unit]# echo "This is a good book" | sed -n '/book$/p'
This is a good book

组合使用锚点^$

查找特定的数据行

[root@tzPC 20Unit]# cat data4
this is a test of using both anchors
I said this is a test
this is a test
I'm sure this is a test.
[root@tzPC 20Unit]# sed -n '/^this is a test$/p' data4
this is a test

过滤空白行

[root@tzPC 20Unit]# cat data5
This is one test line.

This is another test line.
[root@tzPC 20Unit]# sed '/^$/d' data5
This is one test line.
This is another test line.

点号字符

点号字符用来匹配除换行符之外的任意单个字符，包括空格

[root@tzPC 20Unit]# cat data6
This is a test of a line.
The cat is sleeping.
That is a very nice hat.
This test is at line four.
at ten o'clock we'll go home.
[root@tzPC 20Unit]# sed -n '/.at/p' data6
The cat is sleeping.
That is a very nice hat.
This test is at line four.

匹配任意单个字符[ ]

匹配方括号[ ]中的任意单个字符。

[root@tzPC 20Unit]# echo "Yes" | sed -n '/[Yy]es/p'
Yes

匹配排除某个特定字符[^ ]

例：查找排除at前包含c或h的数据

[root@tzPC 20Unit]# sed -n '/[^ch]at/p' data6
This test is at line four.

区间[0-9]

例：查找正确的邮编

[root@tzPC 20Unit]# sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' data8
60633
46201
22203

例：查找at前包含c到h字符的数据

[root@tzPC 20Unit]# sed -n '/[c-h]at/p' data6
The cat is sleeping.
That is a very nice hat.

查看at前包含a到c,h到m的字符数据

因为f不在此区间，所以啥也莫得

[root@tzPC 20Unit]# echo "I'm getting too fat." | sed -n '/[a-ch-m]at/p'
[root@tzPC 20Unit]#

BRE特殊字符组

字符集	含义	字符集	含义
[:alpha:]	字母	[:graph:]	非空格字符
[:alnum:]	字母跟数字	[:print:]	任何能显示的字符
[:cntrl:]	控制字符	[:space:]	任何产生空白的字符
[:digit:]	数字	[:blank:]	空格和制表符
[:xdigit:]	十六进制数字	[:lower:]	小写字母
[:punct:]	标点符号	[:upper:]	大写字母

例子

[root@tzPC 20Unit]# echo "This is, a test" | sed -n '/[[:punct:]]/p'
This is, a test

星号

该字符在文本中出现0次或多次

[root@tzPC 20Unit]# echo "ik" |sed -n '/ie*k/p'
ik

例子

[root@tzPC 20Unit]# echo "baaaeeet" | sed -n '/b[ae]*t/p'
baaaeeet

扩展正则表达式（POSIX ERE模式）

gawk程序能识别此模式，sed不能识别。

sed编辑器处理速度比gawk快。

问号？

问号表明前面的字符可以出现0次或1次

[root@tzPC 20Unit]# echo "bet" | gawk '/be?t/{print $0}'
bet

加号+

加号表明前面的字符可以出现1次或多次，但必须至少出现1次

[root@tzPC 20Unit]# echo "beet" | gawk '/be+t/{print $0}'
beet

花括号{}

指定前面的字符出现次数，被称为间隔。

m：前面的字符准确出现m次

m,n：前面的字符出现m次，最多出现n次

注意：gawk默认不会识别正则表达式间隔，需要加上--re-interval选项

例：匹配bet，其中e准确出现过1次，如果出现多次则不匹配

[root@tzPC 20Unit]# echo "bet" | gawk --re-interval '/be{1}t/{print $0}'
bet
[root@tzPC 20Unit]# echo "beeeet" | gawk --re-interval '/be{1}t/{print $0}'
[root@tzPC 20Unit]#

例：匹配bet，其中e字符可以出现1次或2次

[root@tzPC 20Unit]# echo "beat" | gawk --re-interval '/b[ae]{1,2}t/{print $0}'
beat

管道符号|

匹配两个或多个模式，之间关系为or

[root@tzPC 20Unit]# echo "He has a hat." | gawk '/[ch]at|dog/{print $0}'
He has a hat.

表达式分组()

相当于管道符，分组的关系为and

[root@tzPC 20Unit]# echo "Saturday" | gawk '/Sat(urday)?/{print $0}'
Saturday

例

[root@tzPC 20Unit]# echo "cab" | gawk '/(c|b)a(b|t)/{print $0}'
cab

好了到了最后部分，正则表达式实战！

统计目录文件数量

以环境变量PATH中的目录为例

[root@tzPC 20Unit]# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

过滤：

[root@tzPC 20Unit]# echo $PATH | sed 's/:/ /g'
/usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /root/bin

使用for遍历整个目录

mypath=$(echo $PATH | sed 's/:/ /g')
for directory in $mypath
do
...
done

整个脚本如下

脚本可以再一步对目录或文件进行判断处理

[root@tzPC 20Unit]# cat countfiles.sh 
#!/bin/bash
mypath=$(echo $PATH | sed 's/:/ /g')
echo $mypath
count=0

for directory in $mypath
do
    check=$(ls $directory)
    #check变量里为以空格分开的文件或目录
    for item in $check
    do
        count=$[ $count + 1 ]
    done
echo "$directory 共有$count个目录或文件"
count=0
done

answer

验证美国电话号码

在美国电话号码格式如下

(223)456-7892
(223) 456-7891
223-456-7891
223.456.7893

首先判断有无左圆括号

^\(?

判断区号，美国区号是从2开始的

[2-9][0-9]{2}
#{2}表示前一个字符[0-9]出现过2次

判断右圆括号

\)?

区号后可能有空格，-，.

(| |-|\.)

然后是3位电话号码

[0-9]{3}

在电话号码后还必须匹配一个空格、一个单破折线或一个点(这里有个疑问，如果是必须有一个空格，不应该这么写P441)

( |-|\.)

最后在尾部匹配4位本地电话号分机号

[0-9]{4}$

整个正则表达式为

^\(?[2-9][0-9]{2}\)?(| |-|\.)[0-9]{3}( |-|\.)[0-9]{4}$

写入脚本

[root@tzPC 20Unit]# cat isphone 
gawk --re-interval '/^\(?[2-9][0-9]{2}\)?(| |-|\.)[0-9]{3}( |-|\.)[0-9]{4}$/{print $0}'

验证脚本

[root@tzPC 20Unit]# echo "317-555-1234" | bash isphone
317-555-1234

或

[root@tzPC 20Unit]# cat phonelist | bash isphone 
(223)456-7892
(223) 456-7891
223-456-7891
223.456.7893

解析邮件地址

邮件地址格式为

username@hostname

username值包括字母数字 . - + _

^([a-zA-Z0-9_\-\.\+]+)@

服务器名包括字母数字 . _

([a-zA-Z0-9_\-\.]+)

顶级域名包括字母，不小于2个字符，不得超过5个字符

\.([a-zA-Z]{2,5})$

串起来如下

^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

效果自行检测，我这没得问题

[root@tzPC 20Unit]# cat isemail 
gawk --re-interval '/^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/{print $0}'
[root@tzPC 20Unit]# echo "tz@163.com" | bash isemail 
tz@163.com

学习来自：《Linux命令行与Shell脚本大全第3版》第20章

　　　　　《Linux运维之道第2版》第3章

posted @ 2020-08-30 10:42 努力吧阿团阅读(189) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

努力吧阿团

闭关中...