MySql系列二:插入数据效率比较
本文主要测试mysql插入数据效率,测试机器硬件为:inter i3 (2.13GHz,4核) + 4G内存(2.99G可用) +32位操作系统
一:表结构
CREATE TABLE `record` ( `id` int(11) NOT NULL AUTO_INCREMENT, `openid` varchar(63) NOT NULL, `tag_id` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
二:单线程单条插入
public class TestOneByOneController { private static Logger logger = LoggerFactory.getLogger(TestOneByOneController.class); @Autowired private RecordRepository recordRepository; private static final int total = 10000; @RequestMapping(value = "/testOneByOne") @ResponseBody public String testOneByOne(){ long startTime = System.currentTimeMillis(); try { int i; for (i = 0; i < total; i++) { String openid = UUID.randomUUID().toString(); RecordEntity record = new RecordEntity(); record.setOpenid(openid); record.setTagId(i); recordRepository.save(record); } }catch (Exception ex){ ex.printStackTrace(); } Long endTime = System.currentTimeMillis(); String result = "testOneByOne spend time is " + (endTime - startTime) + "ms"; logger.info(result); return result; } }
1万条数据插入结果:大约需要407s
三:单线程批量插入
(1)测试代码
package com.ws.learn.controller; @RestController public class TestController { private static Logger logger = LoggerFactory.getLogger(TestController.class); private EntityManagerFactory emf; @PersistenceUnit//使用这个标记来注入EntityManagerFactory public void setEntityManagerFactory(EntityManagerFactory emf) { this.emf = emf; } private static final int total = 1000000; private static final int perThreadDealNum = 10000; @RequestMapping(value = "/test") @ResponseBody public String test(){ long startTime = System.currentTimeMillis(); try { StringBuilder sb = new StringBuilder(204800); sb.append("insert into record(openid,tag_id) values"); int num = 0; for (num = 0; num < total; num++) { String openid = UUID.randomUUID().toString(); sb.append("('" + openid + "'," + num + "),"); if((num + 1)%perThreadDealNum == 0){ sb.deleteCharAt(sb.length() -1); myBatchInsert(sb); sb = new StringBuilder(204800); sb.append("insert into record(openid,tag_id) values"); } } if ( num % perThreadDealNum != 0) { sb.deleteCharAt(sb.length() -1); myBatchInsert(sb); } }catch (Exception ex){ ex.printStackTrace(); }finally { } Long endTime = System.currentTimeMillis(); String result = "test spend time is " + (endTime - startTime) + "ms"; logger.info(result); return result; } public void myBatchInsert(StringBuilder sb){ EntityManager em = emf.createEntityManager(); em.getTransaction().begin(); Query query = em.createNativeQuery(sb.toString()); query.executeUpdate(); em.getTransaction().commit(); em.close(); } }
(2)测试结果
(2.1)100万的数据,每次批量插入1万,插入时间需要44s左右
2018-04-05 14:21:40.917 INFO 3012 --- [nio-8080-exec-1] com.ws.learn.controller.TestController : test spend time is 44025ms
(2.2) 100万的数据,每次批量插入2万,插入时间需要42s左右,基本没多少改进
2018-04-05 14:25:30.911 INFO 3012 --- [nio-8080-exec-1] com.ws.learn.controller.TestController : test spend time is 41980ms
(2.3)当每次批量插入5万时报错:
四:多线程批量插入
(1)测试代码
@RestController public class TestThreadController { private static Logger logger = LoggerFactory.getLogger(TestThreadController.class); private EntityManagerFactory emf; @PersistenceUnit//使用这个标记来注入EntityManagerFactory public void setEntityManagerFactory(EntityManagerFactory emf) { this.emf = emf; } private static final int total = 1000000; private static final int perThreadDealNum = 10000; @RequestMapping(value = "/testWithThread") @ResponseBody public String testWithThread(){ long startTime = System.currentTimeMillis(); try { logger.info("" + Runtime.getRuntime().availableProcessors()); final ExecutorService fixExecutorPool = Executors.newFixedThreadPool(10); StringBuilder sb = new StringBuilder(102400); sb.append("insert into record(openid,tag_id) values"); int i; for (i = 0; i < total; i++) { String openid = UUID.randomUUID().toString(); sb.append("('" + openid + "'," + i + "),"); if((i+1)%perThreadDealNum ==0){ sb.deleteCharAt(sb.length() -1); fixExecutorPool.execute(new RecordThread(sb, emf)); sb = new StringBuilder(204800); sb.append("insert into record(openid,tag_id) values"); } } if (i% perThreadDealNum != 0) { sb.deleteCharAt(sb.length() -1); fixExecutorPool.execute(new RecordThread(sb, emf)); } fixExecutorPool.shutdown(); while (!fixExecutorPool.awaitTermination(500, TimeUnit.MILLISECONDS)){ } }catch (Exception ex){ ex.printStackTrace(); }finally { } Long endTime = System.currentTimeMillis(); String result = "testWithThread spend time is " + (endTime - startTime) + "ms"; logger.info(result); return result; } }
public class RecordThread implements Runnable { private StringBuilder sb; private EntityManagerFactory emf; public RecordThread(StringBuilder sb, EntityManagerFactory emf){ this.sb = sb; this.emf = emf; } @Override public void run() { EntityManager em = emf.createEntityManager(); em.getTransaction().begin(); Query query = em.createNativeQuery(sb.toString()); query.executeUpdate(); em.getTransaction().commit(); em.close(); } }
(2)测试结果
(2.1) 100万的数据,10个固定线程,每次批量插入1万,插入时间需要30339ms左右,比单线程的40+s有一定提升。
第二次运行有25866ms,第三次27187ms
(2.2)100万的数据,10个固定线程,每次批量插入5千,插入时间需要29013ms,提升效果不大
(2.3)100万的数据,10个固定线程,每次批量插入2万,插入时间需要32511ms左右,时间反而增加了
注意,如果jvm参数设置过小,此时有可能会出现 Exception in thread "pool-13-thread-1" java.lang.OutOfMemoryError: Java heap space 的异常
我的VM配置:
运行结果:
(2.4)200万的数据,10个固定线程,每次批量插入1万,插入时间需要63084ms
注意,需要调整好jvm参数,不然会堆内存溢出。
(2.5)200万的数据,15个固定线程,每次批量插入1万,插入时间需要62322ms,这种情况下增加线程基本无影响
(3)其它测试结果(由于有误差和垃圾回收等影响,数据量越大误差越大,下面为大致结果)
数据量 | 线程数 | 单次批量插入 | 耗时 |
10万 | 单线程 | 2000 | 5145ms |
10万 | 单线程 | 5000 | 4112ms |
10万 | 单线程 | 10000 | 4746ms |
10万 | 5 | 2000 | 2371ms |
10万 | 5 | 5000 | 2074ms |
10万 | 5 | 10000 | 2006ms |
100万 | 10 | 10000 | 25866ms |
100万 | 5 | 10000 | 25003ms |
100万 | 5 | 5000 | 29883ms |
100万 | 5 | 2000 | 35976ms |
100万 | 单线程 | 10000 | 40690ms |
100万 | 单线程 | 5000 | 45985ms |
100万 | 单线程 | 2000 | 57116ms |
在实际情况中,需要根据插入数据量大小和任务执行大致所需时间,合理选择线程数和单次批量插入条数。比如上面10万级别下,5000是一个比较合理的选择。当达到一定线程数后,增加线程数对耗时基本无太大影响。单次批量插入数有一定影响。
(4)下面是使用jconsole工具监控的200万数据测试时内存变化,突增的时候是在测试的时候。