cut和tr对文本字符串处理
Linux 命令: cut 和 tr
1. 写在前面
本文主要介绍:Linux "cut "和 "tr" 命令行实用程序概述;
公众号: 滑翔的纸飞机
2. Linux 命令:cut
“cut” 命令是一种命令行工具,允许我们剪切指定文件或管道数据的部分内容,并将结果打印到标准输出。
root@dev:~# man cut
-------------------------------------------------------
NAME
cut - remove sections from each line of files
SYNOPSIS
cut OPTION... [FILE]...
... ...
-b, --bytes=LIST
select only these bytes
-c, --characters=LIST
select only these characters
-d, --delimiter=DELIM
use DELIM instead of TAB for field delimiter
... ...
下面是一个文本文件:让我们看看如何操作下面的文本文件,以根据需要打印输出。
test.txt:
Nov 15 00:13:08 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.10000000-0000-0000-0000-000000000000[1938]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:15 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.01000000-0000-0000-0000-000000000000[1936]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:15 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.06000000-0200-0000-0000-000000000000[1935]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:15 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.04000000-0200-0000-0000-000000000000[1939]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.05000000-0600-0000-0000-000000000000[1940]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.08000000-0100-0000-0000-000000000000[1941]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.0D000000-0200-0000-0000-000000000000[1917]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.0E000000-0400-0000-0000-000000000000[1937]): Service exited due to SIGKILL | sent by mds[98]
2.1 按字符范围打印
在一定字符范围内打印输出 :
范围:1 - 5
root@dev:~# cut -c 1-5 test.txt
-------------------------------------------------------
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1
范围:21 - 40
root@dev:~# cut -c 21-40 test.txt
-------------------------------------------------------
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
范围:70 - end
root@dev:~# cut -c 76- test.txt
-------------------------------------------------------
00000-0000-0000-0000-000000000000[1938]): Service exited due to SIGKILL | sent by mds[98]
00000-0000-0000-0000-000000000000[1936]): Service exited due to SIGKILL | sent by mds[98]
00000-0200-0000-0000-000000000000[1935]): Service exited due to SIGKILL | sent by mds[98]
00000-0200-0000-0000-000000000000[1939]): Service exited due to SIGKILL | sent by mds[98]
00000-0600-0000-0000-000000000000[1940]): Service exited due to SIGKILL | sent by mds[98]
00000-0100-0000-0000-000000000000[1941]): Service exited due to SIGKILL | sent by mds[98]
00000-0200-0000-0000-000000000000[1917]): Service exited due to SIGKILL | sent by mds[98]
00000-0400-0000-0000-000000000000[1937]): Service exited due to SIGKILL | sent by mds[98]
2.2 按字段名打印
假设我们想根据字段从以下文件中提取数据。
test.txt:
NAME EMAIL PHONE ADDRESS
devid devid@text.com 0897663232 beijin,china
harry harry@text.com 0232323232 hangzhou,china
jane jane@text.com 0323213122 zhejiang,china
我们必须使用"-d = delimiter"
选项(可以是一个字符,默认为 TAB)来分隔每个字段。然后,我们必须指定要打印的字段编号。
-d, --delimiter=DELIM
-f, --fields=LIST
>> cut -d ' ' -f1
在下面的演示中,我们使用空格(' ')作为分隔符。
# 打印空格分割第1列
root@dev:~# cut -d ' ' -f1 test.txt
-------------------------------------------------------
NAME
devid
harry
jane
# 打印空格分割第2列
root@dev:~# cut -d ' ' -f2 test.txt
-------------------------------------------------------
EMAIL
devid@text.com
harry@text.com
jane@text.com
打印多个字段:打印第1、3列
root@jpzhang-dev:~# cut -d ' ' -f1,3 test.txt
-------------------------------------------------------
NAME PHONE
devid 0897663232
harry 0232323232
jane 0323213122
使用逗号 (, ) 作为分隔符:
root@dev:~# echo "jane,jane@dev,12345678,china" | cut -d ',' -f1
--------------------------------------------------------------------
jane
root@dev:~# echo "jane,jane@dev,12345678,china" | cut -d ',' -f2
--------------------------------------------------------------------
jane@dev
root@dev:~# echo "jane,jane@dev,12345678,china" | cut -d ',' -f3
--------------------------------------------------------------------
12345678
3. Linux 命令:tr
Linux tr 命令用于转换或删除文件中的字符。
tr 指令从标准输入设备读取数据,经过字符串转译后,将结果输出到标准输出设备。
语法
tr [-cdst][--help][--version][第一字符集][第二字符集]
tr [OPTION]…SET1[SET2]
具体参数:
>> man tr
--------------------------------------------------------------------
tr [OPTION]... SET1 [SET2]
# Options
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each sequence of a repeated character that is listed in the last specified SET, with a single occurrence of that character
-t, --truncate-set1
first truncate SET1 to length of SET2
...
...
参数说明:
⁃ -c, --complement:反选设定字符。也就是符合 SET1 的部份不做处理,不符合的剩余部份才进行转换
⁃ -d, --delete:删除指令字符
⁃ -s, --squeeze-repeats:缩减连续重复的字符成指定的单个字符
⁃ -t, --truncate-set1:削减 SET1 指定范围,使之与 SET2 设定长度相等
⁃ --help:显示程序用法信息
⁃ --version:显示程序本身的版本信息
字符集合范围:
⁃ \NNN 八进制值的字符 NNN (1 to 3 为八进制值的字符)
⁃ \\ 反斜杠
⁃ \a Ctrl-G 铃声
⁃ \b Ctrl-H 退格符
⁃ \f Ctrl-L 走行换页
⁃ \n Ctrl-J 新行
⁃ \r Ctrl-M 回车
⁃ \t Ctrl-I tab键
⁃ \v Ctrl-X 水平制表符
⁃ CHAR1-CHAR2 :字符范围从 CHAR1 到 CHAR2 的指定,范围的指定以 ASCII 码的次序为基础,只能由小到大,不能由大到小。
⁃ [CHAR*] :这是 SET2 专用的设定,功能是重复指定的字符到与 SET1 相同长度为止
⁃ [CHAR*REPEAT] :这也是 SET2 专用的设定,功能是重复指定的字符到设定的 REPEAT 次数为止(REPEAT 的数字采 8 进位制计算,以 0 为开始)
⁃ [:alnum:] :所有字母字符与数字
⁃ [:alpha:] :所有字母字符
⁃ [:blank:] :所有水平空格
⁃ [:cntrl:] :所有控制字符
⁃ [:digit:] :所有数字
⁃ [:graph:] :所有可打印的字符(不包含空格符)
⁃ [:lower:] :所有小写字母
⁃ [:print:] :所有可打印的字符(包含空格符)
⁃ [:punct:] :所有标点字符
⁃ [:space:] :所有水平与垂直空格符
⁃ [:upper:] :所有大写字母
⁃ [:xdigit:] :所有 16 进位制的数字
⁃ [=CHAR=] :所有符合指定的字符(等号里的 CHAR,代表你可自订的字符)
3.1 替换字符
替换字符:'H' > 'h'
root@dev:~# echo "Hello World" | tr 'H' 'h'
--------------------------------------------------------------------
hello World
替换字符:‘Ho’ > ‘xx’ 即 'H' 或 ‘o’ 替换为 ‘x’
root@dev:~# echo "Hello World" | tr 'Ho' 'xx'
--------------------------------------------------------------------
xellx Wxrld
3.2 删除字符
# 删除 'H' 或 'o'
root@dev:~# echo "Hello World" | tr -d 'Ho'
--------------------------------------------------------------------
ell Wrld
# 反选,除'Hd\n'其他删除
root@dev:~# echo "Hello World" | tr -cd 'Hd\n'
--------------------------------------------------------------------
Hd
# 反选,除数字外其他删除
root@dev:~# echo "Hello World 12345 " | tr -cd [:digit:]
--------------------------------------------------------------------
12345
# 反选,除字母外其他删除
root@dev:~# echo "Hello World 12345 " | tr -cd [:alpha:]
--------------------------------------------------------------------
HelloWorld
3.3 压缩字符
# 压缩指定重复字符
root@dev:~# echo "HHHHHHHHellooooo Woooorrrrrrrrrldddddddddddddddddd" | tr -s 'Hord'
------------------------------------------------------------------------------------
Hello World
# 压缩重复字符,小写转换大写
root@dev:~# echo "Hello World" | tr -s [:lower:] [:upper:]
------------------------------------------------------------------------------------
HELO WORLD
感谢您花时间阅读文章
关注公众号不迷路
滑翔的纸飞机
追逐技术,打破黑箱,分享技术干货
公众号
/////往期精彩/////
收录于合集 #linux
5个上一篇Linux 命令: dmesg | uname
滑翔的纸飞机
关注后可发消息