Hibernate Batch Processing with MySQL

Batch processing is a common solution to improve database performance by inserting or updating massive data in bulk. However, there are some hiccups you might run into. Let's walk through an example to discover the issues. Suppose we have a simple table named Account with column nameand age. Followings are the SQL script (MySQL), Hibernate bean, and batch insert codes.

Account.sql

CREATE TABLE `Account` (
  `id` int(20) NOT NULL auto_increment,
  `name` varchar(20) NOT NULL,
  `age` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Account.java

@Entity
@Table(name="Account")
public class Account implements Serializable {
  
  private static final long serialVersionUID = 6954737637154168665L;
  
  private long id;
  private String name;
  private int age;
  
  @Id @GeneratedValue(strategy=GenerationType.IDENTITY)
  public long getId() {
    return id;
  }
 
  public void setId(long id) {
    this.id = id;
  }
 
  @Basic
  @Column(name="name", nullable=false, length=20)
  public String getName() {
    return name;
  }
 
  public void setName(String name) {
    this.name = name;
  }
  
  @Basic
  @Column(name="age", nullable=false)
  public int getAge() {
    return age;
  }
 
  public void setAge(int age) {
    this.age = age;
  }
  
}

Test.java

Transaction tx = session.beginTransaction();
         
for(int i=0; i < accounts.size(); i++) {
  Account account = accounts.get(i);
  session.save(account);
 
  if(i % 50 == 0) {
    session.flush();
    session.clear();
  }
}
tx.commit();

First thing to notice is you will need to flush and clear the data every N (50 here) records as Hibernate caches newly inserted records in session-level cache. Otherwise, you will seeOutOfMemoryException somewhere if you have a large amount of records.

Secondly, it's essential to configure hibernate.jdbc.batch_size to achieve optimal performance. The value should be the same with N. You could play with different values to find the best one for your application. In addition, the second-level cache needs to be disabled either in the configuration or through CacheMode programmingly.

But wait... Why didn't I see performance improvement??

The problem is Hibernate explicitly says:

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

As a result, we have to use alternative identity generator. I am going to show you how to useMultipleHiLoPerTableGenerator due to efficiency of the HiLo algorithm for identity generation. Here are the updated annotation in Account.java as well as the new table to store the sequence's next hi value.

@GenericGenerator(name="AccountIdGenerator", strategy="org.hibernate.id.MultipleHiLoPerTableGenerator",
  parameters = {
    @Parameter(name="table", value="IdentityGenerator"),
    @Parameter(name="primary_key_column", value="sequence_name"),
    @Parameter(name="primary_key_value", value="Account"),
    @Parameter(name="value_column", value="next_hi_value"),
    @Parameter(name="primary_key_length", value="100"),
    @Parameter(name="max_lo", value="1000")
  }
)
@Id @GeneratedValue(strategy=GenerationType.TABLE, generator="AccountIdGenerator")

CREATE TABLE `IdentityGenerator` (
  `id` int(11) NOT NULL auto_increment,
  `sequence_name` varchar(100) NOT NULL,
  `next_hi_value` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE= InnoDB DEFAULT CHARSET=latin1;

Yes.. I know.. The performance still sucks.

The reason is because MySQL does not support batch form of prepare statement parameter bindings. Fortunately, we could turn on rewriteBatchedStatements so that it will rewrite them to the form as "INSERT INTO Account VALUES (...), (...), (...)".

<property name="hibernate.connection.url">
  jdbc:mysql://127.0.0.1/sheng?rewriteBatchedStatements=true
</property>

Congratulations, you are all set! You will see great performance gain with the same test set you run.

6 comments:

developmentalityJune 2, 2011 at 2:46 PM

Excellent summary of the problems involved with batch inserts, Hibernate, and MySQL.
Eco-nemesisOctober 19, 2012 at 5:30 AM

With JPA 2.0/Hibernate 2.1.7 the above suggestions make things worse rather than better. One insert with GenerationType.IDENTITY takes 41ms on my Mysql 5.4. Switching on logging I only see one batch query.

I tried to improve with the Identity Generator suggestions above, but the time doubled instead to 83ms for 1 insert.

When I switched on sql statement logging I noticed that instead of 1 query, hibernate was doing 3, two extra queries before my actual insert:

Hibernate: select tbl.next_val from hibernate_sequences tbl where tbl.sequence_name=? for update

Hibernate: update hibernate_sequences set next_val=? where next_val=? and sequence_name=?


Eco-nemesisOctober 19, 2012 at 5:31 AM

Correction, Hibernate 4.1.7
Andrei AmarieiJanuary 16, 2013 at 11:54 PM

This is an optimization when batch inserting. For example, when you insert 1000 records, you will have same 2 extra queries, and not extra 1000 selects of inserted ids, if i`m not mistaken.
Sudarshan SreenivasanMarch 3, 2014 at 10:43 AM

I am using Hibernate 4.1.9.Final and it throws a BatchedTooManyRowsAffectedException exception.

The org.hibernate.jdbc.Expectations.BasicExpectation.verifyOutcome(int rowCount, PreparedStatement statement, int batchPosition) method throws this exception

Jonathan KolbMarch 14, 2014 at 6:48 AM

I saw this error when using mysql-connector-java 5.1.29. When I changed down to version 5.1.26 the method above works.

I think it has to do with the bugs fixed in https://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-27.html.

Specifically: 'When rewriteBatchedStatements=true and useAffectedRows=true were set, executeBatch() did not return the number of affected rows as expected. (Bug #16442951, Bug #68562)'

from : http://shengchien.blogspot.com/2010/10/hibernate-batch-processing-with-mysql.html (Blocked by the GFW)

6 comments:

公告