shell 脚本大文件处理
shell 脚本大文件处理
字符串处理
s='{"_id":{"$oid":"59b73d80930c17474f9f050d"},"address":"01ny.cn/xinxi/17571162.html"}' s_sub='{"_id":{"$oid":"59b73d80930c17474f9f050d"},"address":"' s_sub_sub='"}' r=${s//$s_sub/} r=${r//$s_sub_sub/} echo $r expr length '{"_id":{"$oid":"59b73d80930c17474f9f050d"},"address":"' r=${s:55:100} r=${r//$s_sub_sub/} echo $r
shell变量里的字符替换 - CSDN博客 http://blog.csdn.net/augusdi/article/details/41010041
linux shell 字符串操作详解 (长度,读取,替换,截取,连接,对比,删除,位置 ) - gaomatlab - 博客园 https://www.cnblogs.com/gaochsh/p/6901809.html
大文件处理
awk分割字符串
awk '{split(substr($1,55,100),arr,"\"") ;print arr[1]}' kwaddress_address_20180227.json
awk '{split(substr($1,55,100),arr,"\""); split(arr[1],arr_b,"/");print arr_b[1]}' kwaddress_address_20180227.json
[root@hadoop3 kwaddress]# cat extract.2g.sh s_sub_sub='"}' r='' s=`awk '{split(substr($1,55,100),arr,"\""); split(arr[1],arr_b,"/");print arr_b[1]}' kwaddress_address_20180227.json` #for LINE in `cat /home/data/kwaddress/kwaddress_address_20180227.json` #for LINE in `awk '{print substr($1,55,100)}' kwaddress_address_20180227.json` for LINE in $s do echo $LINE #r=${LINE:55:100} # r=${r//$s_sub_sub/} # echo $r done echo exit 0 [root@hadoop3 kwaddress]#
awk '{split(substr($1,55,100),arr,"\""); split(arr[1],arr_b,"/");print arr_b[1]}' kwaddress_address_20180227.json >> url.pool.txt