shell 脚本大文件处理

 

 

shell  脚本大文件处理

 

字符串处理

 

 

s='{"_id":{"$oid":"59b73d80930c17474f9f050d"},"address":"01ny.cn/xinxi/17571162.html"}'
s_sub='{"_id":{"$oid":"59b73d80930c17474f9f050d"},"address":"'
s_sub_sub='"}'
r=${s//$s_sub/}
r=${r//$s_sub_sub/}
echo $r

expr length '{"_id":{"$oid":"59b73d80930c17474f9f050d"},"address":"'

r=${s:55:100}
r=${r//$s_sub_sub/}
echo $r

  

 

shell变量里的字符替换 - CSDN博客 http://blog.csdn.net/augusdi/article/details/41010041

linux shell 字符串操作详解 (长度,读取,替换,截取,连接,对比,删除,位置 ) - gaomatlab - 博客园 https://www.cnblogs.com/gaochsh/p/6901809.html

 

大文件处理

awk分割字符串

 awk '{split(substr($1,55,100),arr,"\"") ;print arr[1]}' kwaddress_address_20180227.json

 

 

awk '{split(substr($1,55,100),arr,"\""); split(arr[1],arr_b,"/");print arr_b[1]}' kwaddress_address_20180227.json

 

[root@hadoop3 kwaddress]# cat  extract.2g.sh 
s_sub_sub='"}'
r=''
s=`awk '{split(substr($1,55,100),arr,"\""); split(arr[1],arr_b,"/");print arr_b[1]}' kwaddress_address_20180227.json`
#for LINE in `cat /home/data/kwaddress/kwaddress_address_20180227.json`
#for LINE in `awk '{print substr($1,55,100)}' kwaddress_address_20180227.json`
for LINE in $s
do 
 echo $LINE
 #r=${LINE:55:100}
# r=${r//$s_sub_sub/}
# echo $r
done

echo
exit 0

[root@hadoop3 kwaddress]# 

  

 awk '{split(substr($1,55,100),arr,"\""); split(arr[1],arr_b,"/");print arr_b[1]}' kwaddress_address_20180227.json >> url.pool.txt 

 

posted @ 2018-03-16 09:47  papering  阅读(554)  评论(0编辑  收藏  举报