python工作实战---对文本文件进行分析
目录:
- 查找以什么关键字结尾的文件
- 判断文件大小
- 使用python分析Apache的访问日志
判断目录下.py结尾的文件
[smcuser@smc-postman-script test]$ ll
total 4
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 1.txt
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 2.txt
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 3.txt
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 4.txt
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 5.txt
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 a.py
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 b.py
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 c.py
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 d.py
-rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 e.py
-rw-rw-r--. 1 smcuser smcuser 116 Jan 12 23:35 test.py
#!/url/bin/evn python
#
#
import os
test = [item for item in os.listdir('.') if item.endswith('.py')]
print(test)
执行结果
[smcuser@smc-postman-script test]$ python test.py
['a.py', 'b.py', 'c.py', 'd.py', 'e.py', 'test.py']
判断文件大小
#!/url/bin/evn python # # import os txt = [item for item in os.listdir('.') if item.endswith('.txt')] sun_size = sum(os.path.getsize(os.path.join('/tmp/test',item)) for item in txt) print(sun_size)
使用python分析Apache的访问日志
Apache日志示例 193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.html HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 193.252.243.231 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 193.252.243.230 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 193.252.243.237 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 193.252.243.237 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 193.252.243.230 - - [29/Mar/2009:06:05:34+0200)” GET /index.html HTTP/1.1” 400 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 503 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )””” 跟进IP获取网站的PV和UV(PV是网站的访问请求数,UV是网站的独立访客数) #!/bin/usr/evn python ips = [] with open('access.log') as f: for line in f: ips.append(line.split()[0]) print("pv is {0}".format(len(ips))) print("uv is {0}".format(len(set(ips)))) 统计网站中最热的资源,counter是dict的子类,对于普通的计数功能,Counter比字典更好用 #!/usr/bin/env python from collections import Counter c = Counter() with open('access.log') as f: for line in f: c[line.split()[5]] += 1 print(c.most_common(10)) 统计用户体验,如果http code 为4xx 5xx则视为访问出错,统计出错比例 #!/url/bin/env python # d = {} with open('access.log') as f: for line in f: key = line.split()[7] d.setdefault(key,0) d[key] += 1 print(d) sum_requests = 0 error_requests = 0 for key,val in d.iteritems(): if int(key) >= 400: error_requests += val sum_requests += val print('error rate: {0:.2f}%'.format(error_requests * 100 / sum_requests))
不积跬步,无以至千里;不积小流,无以成江海。