Coder Social home page Coder Social logo

spring-boot-jpa-bulk-insert-performance's Introduction

Spring Boot: JPA Bulk Database Insert

In this project, I achieved reducing 10k records insertion time from 183 seconds to just 5 secs.

For this I did teh following changes :-

1) Change the number of records while inserting.

  1. Set hibernate batchin insert size with the folowing properties.

    spring.jpa.properties.hibernate.jdbc.batch_size=30

ii. Add connection string properties.

cachePrepStmts=true
&useServerPrepStmts=true
&rewriteBatchedStatements=true
e.g
jdbc:mysql://localhost:3306/BOOKS_DB?serverTimezone=UTC&cachePrepStmts=true&useServerPrepStmts=true&rewriteBatchedStatements=true

iii. Changed the code for inserting, so that saveAll methods get batch sizes of 30 to insert as per what we also set in the properties file.

A very crude implementation of something like this.

for (int i = 0; i < totalObjects; i = i + batchSize) {
if( i+ batchSize > totalObjects){
List<Book> books1 = books.subList(i, totalObjects - 1);
repository.saveAll(books1);
break;
}
List<Book> books1 = books.subList(i, i + batchSize);
repository.saveAll(books1);
}

This reduced the time by not that much, but dropped from 185 secs to 153 Secs. That’s approximately 18% improvement.

2) Change the ID generation strategy.

This made a major impact.

I stopped usign the @GeneratedValue annotation with strategy i.e GenerationType.IDENTITY on my entity class. Hibernate has disabled batch update with this strategy, Because it has to make a select call to get the id from the database to insert each row.

I changed the strategy to SEQUENCE and provided a sequence generator.

public class Book {
@Id
@GeneratedValue(strategy = SEQUENCE, generator = "seqGen")
@SequenceGenerator(name = "seqGen", sequenceName = "seq", initialValue = 1)
private Long id;
}

This change drastically changed the insert performance as Hibernate was able to leverage bulk insert. From the previous performance improvement of 153 secs, the time to insert 10k records reduced to only 9 secs. Thats an increase in performance by nearly 95%.

Next, I pushed it further to use higher batch sizes and I noticed that doubling the batch size does not double down on time. The time to insert only gradually reduces.

Batch Size Time to insert (Secs)

30

9.5

60

6.48

200

5.04

500

4.46

1000

4.39

2000

4.5

5000

5.09

The most optimal I found for my case was a batch size of 1000 which took around 4.39 secs for 10K records. After that, I saw the performance degrading as you can see in the graph.

spring-boot-jpa-bulk-insert-performance's People

Contributors

amrutprabhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

spring-boot-jpa-bulk-insert-performance's Issues

How to run this project

Hi, this is a great tutorial. Could you provide some steps on how to run this project, like

  1. I see you have added the docker-compose.yaml file, so what commands to run it and from which path would this install MySQL database?
  2. If docker doesn't install a MySQL instance, and if I don't have one, should I install one and create a database and table manually?
  3. Finally, how to run this

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.