web访问日志分析
日志记录
在Web日志中,每条日志通常代表着用户的一次访问行为,例如下面就是nginx日志
14.23.95.98 - - [17/Mar/2015:22:26:54 -0400] "GET /pmd/phpmyadmin.css.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&js_frame=left&nocache=2705868602 HTTP/1.1" 200 3970 "http://104.131.67.100/pmd/navigation.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
14.23.95.98 - - [17/Mar/2015:22:26:55 -0400] "GET /pmd/js/mootools.js HTTP/1.1" 304 0 "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
14.23.95.98 - - [17/Mar/2015:22:26:55 -0400] "GET /pmd/phpmyadmin.css.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&js_frame=right&nocache=2705868602 HTTP/1.1" 200 21799 "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
14.23.95.98 - - [17/Mar/2015:22:26:55 -0400] "GET /pmd/js/tooltip.js HTTP/1.1" 304 0 "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
这些日志信息,大致可以拆解为以下8个变量
-
remote_addr
记录客户端的ip地址, 14.23.95.98
-
remote_user
记录客户端用户名称
-
time_local
记录访问时间与时区, [17/Mar/2015:22:26:55 -0400]
-
request
记录请求的url与http协议, "GET /pmd/js/tooltip.js HTTP/1.1"
-
status
记录请求状态,成功是200
-
body_bytes_sent
记录发送给客户端文件主体内容大小, 21799
-
http_referer
用来记录从那个页面链接访问过来的, "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl"
-
http_user_agent
记录客户浏览器的相关信息, “"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
日志分析
有了这些记录的日志信心,我们就可以用来做一些分析了
例如,从nginx日志中得到访问量最高前10个IP
[root@biby nginx]# cat access.log | awk '{a[$1]++} END {for(b in a) print b"\t"a[b]}' | sort -k2 -r | head -n 10
14.157.210.181 56
112.64.235.245 3
14.23.95.98 121
211.97.10.56 102