python正则表达式抽取mysql慢查询sql本身,de-parameterize,将参数值改为?
这个问题我提在了 StackOverflow 上,但没有回答。自己写吧
我的需求是,将mysql slow queries展现到页面上。但是如果原始展现,会带不同参数,不太好group等。其实我们关心的只是sql本身,比如
-- 这俩其实是一条慢查询 select * from a where a>1 and b='r' and c=3; select * from a where a>2 and b='x' and c=5; -- 希望能处理到 select * from a where a>? and b='?' and c=?
因为没有很合适的module,所以得用regrex替换。数字很容易,字符串需要考虑
- 最基本的,替换数字可以用r"\b\d+\b" 独立的一个或多个连续数字,这样不会替换如col1等对象中的数字
- 简单地,字符串可以用r"'[^']*'" 表示2个'之间所有非'的连续字符,这样可以适用大多数情况,除了字符串有空格的,比如'a bdf c'就不行了
- 最严谨的方式,按'对sql划分数组。这样所有''里面的字符串都是偶数成员,其他部分奇数成员。替换偶数成员为?就可以了
测试如下
import re
sql = r"select * from a where id='aaaaa haha wocao' and id1= 'fff' and xx=1 and a3= 4 and a4=3434343 and a5a>99" sql.split("'") #['select * from a where id=', 'aaaaa haha wocao', ' and id1= ', 'fff', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] sarr = sql.split("'") sarr #['select * from a where id=', 'aaaaa haha wocao', ' and id1= ', 'fff', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] sarr[::2] #['select * from a where id=', ' and id1= ', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] sarr[1::2] #['aaaaa haha wocao', 'fff'] sarr[1::2]= ['?' for x in sarr[1::2]] sarr #['select * from a where id=', '?', ' and id1= ', '?', ' and xx=1 and a3= 4 and a4=3434343 and a5a>99'] "'".join(sarr) #"select * from a where id='?' and id1= '?' and xx=1 and a3= 4 and a4=3434343 and a5a>99" sql #"select * from a where id='aaaaa haha wocao' and id1= 'fff' and xx=1 and a3= 4 and a4=3434343 and a5a>99"
aa = "'".join(sarr)
re.sub(r"\b\d+\b","?",aa)
#"select * from a where id='?' and id1= '?' and xx=? and a3= ? and a4=? and a5a>?"
最后的函数
# to de-parameterize sql. uniform sql # replace value with ? def _formatSQL(sql): sarr = sql.split("'") sarr[1::2] = ['?' for x in sarr[1::2]] aa = "'".join(sarr) return re.sub(r"\b\d+\b", "?", aa)
sort of, I have some experience in the domain of database(MySQL/mongo), java, python, front-end, etc. I'll willing to give and accept bits of help from others.
now base in Singapore.