Hive脚本笔记

统计一天微博内“taobao.com”出现次数,每小时记录一次:

for i in {00..23}; do hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=20131028 and hour=$i;" >> log.txt; done 

统计某一小时微博内“taobao.com”出现次数:

hive -e "select args['uid'] from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=20131028 and hour=10;" 

寻找某一天微博内出现“taobao.com”的用户TOP20:

hive -e "select args['uid'] from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=20131028;" | sort -rn | uniq -c | sort -rnk1 | head -n20

抽取某一uid的日志记录:

hive -e "select * from mds_anti_sass_log where source=3 and type=1 and args['uid']=2106332597 and dt=20131028;"

抽取某一uid微博发送成功的日志记录:

hive -e "select * from mds_anti_sass_log where source=3 and type=1 and args['uid']=2106332597 and rcode like '%:1}%'  and dt=20131028;"

来源为“淘宝网”的微博数量:

hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['appid']=804659 and dt=20131028;"

10月1日到10月9日taobao.com的走势:

for i in {1..9}; do echo 2013100$i;hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=2013100$i;" >> log_1_9.txt; done

10月10日到10月30日taobao.com的走势:

for i in {10..30}; do echo 201310$i;hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=201310$i;" >> log.txt; done

10月10日到10月30日发微博包含“taobao.com”的独立uid数:

for i in {10..30}; do echo 201310$i;hive -e "select args['uid'] from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=201310$i;" | sort -rn | uniq -c | wc -l >> uid_num.txt ; done

posted @ 2013-10-30 15:47  _Eric.Liu  阅读(844)  评论(0编辑  收藏  举报