Linux中对为本去重
1.格式
uniq [OPTION]... [INPUT [OUTPUT]]
2.命令
-c, --count prefix lines by the number of occurrences -d, --repeated only print duplicate lines -D, --all-repeated[=delimit-method] print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines -f, --skip-fields=N avoid comparing the first N fields -i, --ignore-case ignore differences in case when comparing -s, --skip-chars=N avoid comparing the first N characters -u, --unique only print unique lines -z, --zero-terminated end lines with 0 byte, not newline -w, --check-chars=N compare no more than N characters in lines --help display this help and exit --version output version information and exit
3.举例子
unique.txt
hellopython
hellopython
python
bbs.pythontab.com
python
pythontab.com
python
hello.pythontab.com
hellopythontab
hellopythontab
(1)执行 uniq unique.txt
hellopython
python
bbs.pythontab.com
python
pythontab.com
python
hello.pythontab.com
hellopythontab
(2)看了上面是不是感觉不对呢?再执行uniq -c unique.txt
2 hellopython 1 python 1 bbs.pythontab.com 1 python 1 pythontab.com 1 python 1 hello.pythontab.com 2 hellopythontab 1 #感觉还是不对,uniq检查重复行时,是按相邻的行进行检查的#
(3)再执行sort unique.txt | uniq -c
1 1 bbs.pythontab.com 2 hellopython 2 hellopythontab 3 python 1 pythontab.com 1 hello.pythontab.com
---------------------
EOF