Hibernate Batch Processing with MySQL
Batch processing is a common solution to improve database performance by inserting or updating massive data in bulk. However, there are some hiccups you might run into. Let's walk through an example to discover the issues. Suppose we have a simple table named Account with column nameand age. Followings are the SQL script (MySQL), Hibernate bean, and batch insert codes.
Account.sql
Account.java
Test.java
First thing to notice is you will need to flush and clear the data every N (50 here) records as Hibernate caches newly inserted records in session-level cache. Otherwise, you will seeOutOfMemoryException somewhere if you have a large amount of records.
Secondly, it's essential to configure hibernate.jdbc.batch_size to achieve optimal performance. The value should be the same with N. You could play with different values to find the best one for your application. In addition, the second-level cache needs to be disabled either in the configuration or through CacheMode programmingly.
But wait... Why didn't I see performance improvement??
The problem is Hibernate explicitly says:
As a result, we have to use alternative identity generator. I am going to show you how to useMultipleHiLoPerTableGenerator due to efficiency of the HiLo algorithm for identity generation. Here are the updated annotation in Account.java as well as the new table to store the sequence's next hi value.
Yes.. I know.. The performance still sucks.
The reason is because MySQL does not support batch form of prepare statement parameter bindings. Fortunately, we could turn on rewriteBatchedStatements so that it will rewrite them to the form as "INSERT INTO Account VALUES (...), (...), (...)".
Congratulations, you are all set! You will see great performance gain with the same test set you run.
Account.sql
1
2
3
4
5
6
|
CREATE TABLE `Account` ( `id` int (20) NOT NULL auto_increment, ` name ` varchar (20) NOT NULL , `age` int (11) NOT NULL , PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
Account.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
@Entity @Table (name= "Account" ) public class Account implements Serializable { private static final long serialVersionUID = 6954737637154168665L; private long id; private String name; private int age; @Id @GeneratedValue (strategy=GenerationType.IDENTITY) public long getId() { return id; } public void setId( long id) { this .id = id; } @Basic @Column (name= "name" , nullable= false , length= 20 ) public String getName() { return name; } public void setName(String name) { this .name = name; } @Basic @Column (name= "age" , nullable= false ) public int getAge() { return age; } public void setAge( int age) { this .age = age; } } |
Test.java
1
2
3
4
5
6
7
8
9
10
11
12
|
Transaction tx = session.beginTransaction(); for ( int i= 0 ; i < accounts.size(); i++) { Account account = accounts.get(i); session.save(account); if (i % 50 == 0 ) { session.flush(); session.clear(); } } tx.commit(); |
First thing to notice is you will need to flush and clear the data every N (50 here) records as Hibernate caches newly inserted records in session-level cache. Otherwise, you will seeOutOfMemoryException somewhere if you have a large amount of records.
Secondly, it's essential to configure hibernate.jdbc.batch_size to achieve optimal performance. The value should be the same with N. You could play with different values to find the best one for your application. In addition, the second-level cache needs to be disabled either in the configuration or through CacheMode programmingly.
But wait... Why didn't I see performance improvement??
The problem is Hibernate explicitly says:
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.
As a result, we have to use alternative identity generator. I am going to show you how to useMultipleHiLoPerTableGenerator due to efficiency of the HiLo algorithm for identity generation. Here are the updated annotation in Account.java as well as the new table to store the sequence's next hi value.
1
2
3
4
5
6
7
8
9
10
11
|
@GenericGenerator (name= "AccountIdGenerator" , strategy= "org.hibernate.id.MultipleHiLoPerTableGenerator" , parameters = { @Parameter (name= "table" , value= "IdentityGenerator" ), @Parameter (name= "primary_key_column" , value= "sequence_name" ), @Parameter (name= "primary_key_value" , value= "Account" ), @Parameter (name= "value_column" , value= "next_hi_value" ), @Parameter (name= "primary_key_length" , value= "100" ), @Parameter (name= "max_lo" , value= "1000" ) } ) @Id @GeneratedValue (strategy=GenerationType.TABLE, generator= "AccountIdGenerator" ) |
1
2
3
4
5
6
|
CREATE TABLE `IdentityGenerator` ( `id` int (11) NOT NULL auto_increment, `sequence_name` varchar (100) NOT NULL , `next_hi_value` int (11) NOT NULL , PRIMARY KEY (`id`) ) ENGINE= InnoDB DEFAULT CHARSET=latin1; |
Yes.. I know.. The performance still sucks.
The reason is because MySQL does not support batch form of prepare statement parameter bindings. Fortunately, we could turn on rewriteBatchedStatements so that it will rewrite them to the form as "INSERT INTO Account VALUES (...), (...), (...)".
1
2
3
|
<property name= "hibernate.connection.url" > jdbc:mysql: //127.0.0.1/sheng?rewriteBatchedStatements=true </property> |
Congratulations, you are all set! You will see great performance gain with the same test set you run.
作者:陈 锋
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
Excellent summary of the problems involved with batch inserts, Hibernate, and MySQL.
ReplyWith JPA 2.0/Hibernate 2.1.7 the above suggestions make things worse rather than better. One insert with GenerationType.IDENTITY takes 41ms on my Mysql 5.4. Switching on logging I only see one batch query.
ReplyI tried to improve with the Identity Generator suggestions above, but the time doubled instead to 83ms for 1 insert.
When I switched on sql statement logging I noticed that instead of 1 query, hibernate was doing 3, two extra queries before my actual insert:
Hibernate: select tbl.next_val from hibernate_sequences tbl where tbl.sequence_name=? for update
Hibernate: update hibernate_sequences set next_val=? where next_val=? and sequence_name=?
Correction, Hibernate 4.1.7
This is an optimization when batch inserting. For example, when you insert 1000 records, you will have same 2 extra queries, and not extra 1000 selects of inserted ids, if i`m not mistaken.
I am using Hibernate 4.1.9.Final and it throws a BatchedTooManyRowsAffectedException exception.
ReplyThe org.hibernate.jdbc.Expectations.BasicExpectation.verifyOutcome(int rowCount, PreparedStatement statement, int batchPosition) method throws this exception
I saw this error when using mysql-connector-java 5.1.29. When I changed down to version 5.1.26 the method above works.
I think it has to do with the bugs fixed in https://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-27.html.
Specifically: 'When rewriteBatchedStatements=true and useAffectedRows=true were set, executeBatch() did not return the number of affected rows as expected. (Bug #16442951, Bug #68562)'