Python 进行查询日志查询条件分析

任务:crm日志的查询条件  每次是哪几个字段查,有几种组合 ,统计每种组合查询的量

日志样例:

1 132.xxx.xx.x -  -  [2017-09-28 16:01:45] "GET /REST/HTableService?appId=crmyun&partition=2017&query=QUERY_TYPE1%3D%E6%8E%A5%E5%85%A5%E5%8F%B7%7Cand%7CQUERY_VALUE1%3D17727955834%7Cand%7CDATETIME%3E20170925000000000%7Cand%7CDATETIME%3C20170928000000000&version=1.0&tablename=TB_CRM_xxxx_xxxxx&method=getData&latnId=755&staffNo=GZTEST200&timestamp=1506585708188&signature=D73E9B59E08EA7B1C2D0DDA72AC957E4 HTTP/1.1" 200 93  100 100
2 132.xxx.xx.x -  -  [2017-09-20 09:35:59] "GET /REST/HTableService?staffNo=xxTEST200&appId=crmyun&version=1.0&tablename=TB_CRM_xxxx_xxxxx&method=getData&timestamp=1505871359000&signature=6743AE272C10BCC2261E11AF4CA5EA19&charset=UTF-8&partition=2017&query=STAFF_ID=1212100141|and|DATETIME>20170917000000000|and|DATETIME<20170919000000000 HTTP/1.1" 200 92  5 5

查询条件:query查询条件可以多个,用|and|分割。

步骤:

        1、正则获取query查询条件组合

1 query=QUERY_TYPE1%3D%E6%8E%A5%E5%85%A5%E5%8F%B7%7Cand%7CQUERY_VALUE1%3D17727955834%7Cand%7CDATETIME%3E20170925000000000%7Cand%7CDATETIME%3C20170928000000000
2 query=STAFF_ID=1212100141|and|DATETIME>20170917000000000|and|DATETIME<20170919000000000

         2、截取query列表,得到查询条件组合;以"%7C|\|"分割,得到列表

1 ['QUERY_TYPE1%3D%E6%8E%A5%E5%85%A5%E5%8F%B7', 'and', 'QUERY_VALUE1%3D17727955834', 'and', 'DATETIME%3E20170925000000000', 'and', 'DATETIME%3C20170928000000000']
2 ['STAFF_ID=1212100141', 'and', 'DATETIME>20170917000000000', 'and', 'DATETIME<20170919000000000']

         3、剔除 'and' 项(列表取[::2])得到新列表

1 ['QUERY_TYPE1%3D%E6%8E%A5%E5%85%A5%E5%8F%B7', 'QUERY_VALUE1%3D17727955834', 'DATETIME%3E20170925000000000', 'DATETIME%3C20170928000000000']
2 ['STAFF_ID=1212100141', 'DATETIME>20170917000000000', 'DATETIME<20170919000000000']

         4、以'%3D|%3E|%3C|>|<|='分割,并将key放入set()中,得到去重后的结果

1 ['QUERY_TYPE1', '%E6%8E%A5%E5%85%A5%E5%8F%B7'] 
2 ['QUERY_VALUE1', '17727955834']
3 ['DATETIME', '20170925000000000']
4 ['DATETIME', '20170928000000000']
5 
6 ['STAFF_ID', '1212100141']
7 ['DATETIME', '20170917000000000']
8 ['DATETIME', '20170919000000000']

        5、将列表key值放入set()中,得到结果;参考代码如下

 1 import sys
 2 import time
 3 import re
 4 
 5 def read_write():
 6     with open("C:\\Users\\admin\\Desktop\\c5.log", 'r') as f1:
 7         for line in f1.readlines():
 8             pattern = re.compile(r'query=.*?\s')
 9             results = re.search(pattern, line).group().split('&')
10             for result in results:
11                 if result.startswith("query"):
12                     temp = result[6: ]
13                     list = re.split("%7C|\|",temp)[::2]
14 #                    print list
15                     my_set = set()
16                     for l in list:
17                         arrya = re.split('%3D|%3E|%3C|>|<|=', l)
18                         my_set.add(arrya[0])
19                     print my_set
20                     c = [i for i in my_set]
21                     file = open("C:\\Users\\admin\\Desktop\\4.txt", 'a')
22                     file.write(repr(c)+'\n')
23                     file.close()
24 
25 if __name__ == '__main__':
26     start = time.time()
27     read_write()
28     stop = time.time()
29     print "running time is "+str(stop - start)

 

posted @ 2017-10-19 10:32  那年花开月正圆  阅读(913)  评论(0编辑  收藏  举报