Python-MySQL抽取表中每个分组的前N个数据
在超大规模数据处理时,有时候需要按分组取出一部分数据,作为整体数据的参考。
例如,有表table_1,按column_1可以划分为200组,现在我们希望从这200组中每组各取出10条记录作为代表。
我们可以通过以下Python语句操作MySQL,来实现此功能:
1 2 3 4 5 6 7 8 | import mysql.connector conn = mysql.connector.connect(host = '127.0.0.1' , port = 3306 , user = 'root' , password = 'xxx' , db = 'xxx' ) cursor = conn.cursor() for i in range ( 0 , 200 ): #注意limit i,j 时 i是从0开始索引的 sql = "INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit %s,1) p where t.column1=p.column1 limit 10;" % i cursor.execute(sql) conn.commit() conn.close() |
当然,由于这是个插入操作,我们也可以使用excutemany来实现:
1 2 3 4 5 6 | import mysql.connector conn = mysql.connector.connect(host = '127.0.0.1' , port = 3306 , user = 'root' , password = 'xxx' , db = 'xxx' ) cursor = conn.cursor() cursor.executemany( "INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit %s,1) p where t.column1=p.column1 limit 10;" , [(i,) for i in range ( 0 , 200 )] ) #注意必须保留(i,)中的",",否则会报错 conn.commit() conn.close() |
再或者,也可以使用MySQL存储过程。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import mysql.connector conn = mysql.connector.connect(host = '127.0.0.1' , port = 3306 , user = 'root' , password = 'xxx' , db = 'xxx' ) cursor = conn.cursor() sql1 = "drop procedure if exists test;" sql2 = "create procedure test() \ begin\ declare i bigint default 0 ; \ DECLARE var BIGINT DEFAULT 0 ; \ while i < 200 do \ INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit i, 1 ) p where t.column1 = p.column1 limit 10 ;\ set i = i + 1 ;\ end while ;\ end;" sql3 = "call test;" cursor.execute(sql1) cursor.execute(sql2) cursor.execute(sql3) conn.commit() conn.close() |
就执行效率来说,excutemany较excute略高一些,存储过程的效率最低。
· Obsidian + DeepSeek:免费 AI 助力你的知识管理,让你的笔记飞起来!
· 分享4款.NET开源、免费、实用的商城系统
· 解决跨域问题的这6种方案,真香!
· 5. Nginx 负载均衡配置案例(附有详细截图说明++)
· Windows 提权-UAC 绕过