Nginx 访问日志分析
nginx默认的日志格式
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
统计访问IP前十
$ awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -10
6958 123.174.51.164
2307 111.85.34.165
1617 118.112.143.148
1489 117.63.146.40
1404 118.182.116.39
1352 1.48.219.30
1132 60.222.231.46
1129 10.35.1.82
943 27.227.163.200
880 58.253.6.133
统计指定某一天的访问IP
$ grep "17/May/2017" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -10
$ awk '/17\/May\/2017/ {print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -10
6958 123.174.51.164
2307 111.85.34.165
1617 118.112.143.148
1489 117.63.146.40
1404 118.182.116.39
1352 1.48.219.30
1132 60.222.231.46
1129 10.35.1.82
943 27.227.163.200
880 58.253.6.133
经过测试,在文件较大的时候,先grep再awk速度快很多。
过滤URL
$ awk '{print $11}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -10
20737 "http://www.adreambox.net/index.php?app=home&mod=User&act=index"
3981 "http://www.adreambox.net/"
1921 "http://www.adreambox.net/index.php?app=adreambox&mod=Class&act=prensent&id=5&type=2"
1299 "http://www.adreambox.net/index.php?app=home&mod=Public&act=doLogin"
1191 "http://www.adreambox.net/index.php?app=group&mod=Group&act=index&gid=1413"
718 "http://www.adreambox.net/index.php?app=group&mod=Group&act=index&gid=1403"
657 "http://www.adreambox.net/index.php?app=wap&mod=Index&act=index"
657 "http://www.adreambox.net/index.php?act=index&app=home&mod=User"
639 "http://www.adreambox.net/index.php?app=group&mod=Manage&act=index&gid=1413"
统计指定资源
$ gawk '$7 ~ /\.html$/ {print $1,$7,$9}' /var/log/nginx/access.log # 处理第7个字段以'.html'结尾的行
11.0.8.5 //ckeditor/notexist_path.html 404
11.0.8.5 //ckeditor/CHANGES.html 404
11.0.8.18 //docs/CHANGELOG.html 404
11.0.8.5 //themes/mall/default/seller_order.confirm.html 404
11.0.8.18 //themes/mall/default/header.html 404
11.0.8.5 //themes/store/default/footer.html 404
11.0.8.5 //templates/admin/index.html 404
11.0.8.5 //system/templates/admin/login.html 404
11.0.8.18 //templates/404.html 404
11.0.8.18 //admin/editor/editor/dialog/fck_about.html 404
11.0.8.5 //fckeditor/_whatsnew.html 404
11.0.8.5 //FCKeditor/_docs/whatsnew.html 404
11.0.8.5 //style/gb/help/index.html 404
10.10.1.11 /Login/login.html 404
过滤指定时间后的日志并打印IP
$ awk '$4 > "[15/May/2017:21:16:38" {print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr
291031 11.0.8.5
274174 11.0.8.18
2764 10.10.1.11
1193 11.0.8.6
1 127.0.0.1
统计流量
$ grep "17/May/2017" /var/log/nginx/access.log | awk '{sum+=$10} END{print sum}' # awk variables are automatically initialized to zero
95210093059
统计状态码
$ awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -10
1271257 200
957444 503
61875 502
32852 404
19121 302
13356 304
2819 500
2789 400
271 499
203 401
过滤某个时间段的日志
sed -n '/2017-5-18 9:51:13/,/2017-5-18 9:55:13/p' /var/log/nginx/access.log