Linux中对为本去重

1.格式

 uniq [OPTION]... [INPUT [OUTPUT]]

2.命令

       -c, --count
              prefix lines by the number of occurrences

       -d, --repeated
              only print duplicate lines

       -D, --all-repeated[=delimit-method]
              print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines

       -f, --skip-fields=N
              avoid comparing the first N fields

       -i, --ignore-case
              ignore differences in case when comparing

       -s, --skip-chars=N
              avoid comparing the first N characters

       -u, --unique
              only print unique lines

       -z, --zero-terminated
              end lines with 0 byte, not newline

       -w, --check-chars=N
              compare no more than N characters in lines

       --help display this help and exit

       --version
              output version information and exit
     

3.举例子

unique.txt

hellopython
hellopython
python
bbs.pythontab.com
python
pythontab.com
python
hello.pythontab.com
hellopythontab
hellopythontab

(1)执行 uniq unique.txt

hellopython
python
bbs.pythontab.com
python
pythontab.com
python
hello.pythontab.com
hellopythontab

(2)看了上面是不是感觉不对呢?再执行uniq -c unique.txt

2 hellopython
1 python
1 bbs.pythontab.com
1 python
1 pythontab.com
1 python
1 hello.pythontab.com
2 hellopythontab
1
#感觉还是不对,uniq检查重复行时,是按相邻的行进行检查的#

(3)再执行sort unique.txt | uniq -c

1
1 bbs.pythontab.com
2 hellopython
2 hellopythontab
3 python
1 pythontab.com
1 hello.pythontab.com

---------------------

EOF

posted @ 2014-10-21 16:34  天天AC  阅读(320)  评论(0编辑  收藏  举报