python正则表达式抽取mysql慢查询sql本身,de-parameterize,将参数值改为?

这个问题我提在了 StackOverflow 上,但没有回答。自己写吧

我的需求是,将mysql slow queries展现到页面上。但是如果原始展现,会带不同参数,不太好group等。其实我们关心的只是sql本身,比如

-- 这俩其实是一条慢查询
select * from a where a>1 and b='r' and c=3;
select * from a where a>2 and b='x' and c=5;
-- 希望能处理到
select * from a where a>? and b='?' and c=?

因为没有很合适的module,所以得用regrex替换。数字很容易,字符串需要考虑

  1. 最基本的,替换数字可以用r"\b\d+\b" 独立的一个或多个连续数字,这样不会替换如col1等对象中的数字
  2. 简单地,字符串可以用r"'[^']*'" 表示2个'之间所有非'的连续字符,这样可以适用大多数情况,除了字符串有空格的,比如'a   bdf c'就不行了
  3. 最严谨的方式,按'对sql划分数组。这样所有''里面的字符串都是偶数成员,其他部分奇数成员。替换偶数成员为?就可以了

测试如下

import re
sql
= r"select * from a where id='aaaaa haha wocao' and id1= 'fff' and xx=1 and a3= 4 and a4=3434343 and a5a>99" sql.split("'") #['select * from a where id=', 'aaaaa haha wocao', ' and id1= ', 'fff', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] sarr = sql.split("'") sarr #['select * from a where id=', 'aaaaa haha wocao', ' and id1= ', 'fff', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] sarr[::2] #['select * from a where id=', ' and id1= ', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] sarr[1::2] #['aaaaa haha wocao', 'fff'] sarr[1::2]= ['?' for x in sarr[1::2]] sarr #['select * from a where id=', '?', ' and id1= ', '?', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] "'".join(sarr) #"select * from a where id='?' and id1= '?' and xx=1 and a3= 4 and a4=3434343 and a5a>99" sql #"select * from a where id='aaaaa haha wocao' and id1= 'fff' and xx=1 and a3= 4 and a4=3434343 and a5a>99"

aa = "'".join(sarr)
re.sub(r"\b\d+\b","?",aa) 
#"select * from a where id='?' and id1= '?' and xx=? and a3= ? and a4=? and a5a>?"

最后的函数

# to de-parameterize sql. uniform sql
# replace value with ?
def _formatSQL(sql):
    sarr = sql.split("'")
    sarr[1::2] = ['?' for x in sarr[1::2]]
    aa = "'".join(sarr)
    return re.sub(r"\b\d+\b", "?", aa)

 

posted @ 2017-07-12 11:17  Els0n  阅读(851)  评论(0编辑  收藏  举报