Usually, developers work with high abstractions that exposes Spring for us. Developers forget about underlying
technologies and could have problems in unusual cases. One of the cases - reading/processing large data sets. Here I
demonstrate something for you. The solution is based on the ResultSet
abstraction.
- use pagination with
LIMIT
andOFFSET
due to performance issue
This is elementary:
First you need to set up local postgres instance. To do that:
cd docker
docker compose up
Then we are going to run the app:
./mvnw clean install
./mvwn spring-boot:run
The app available now: localhost:8082.
- Send
GET
request by/data/all
path - it will trigger full read (select *
). This will cause OutOfMemory exception. - Then send
GET
request by/data/all/batch
path - it will trigger same query, but process the data with 100_000 chunk size. No OutOfMemory expected here.
To make ResultSet
fetch data set in portions, we need:
- create statement with
ResultSet.TYPE_FORWARD_ONLY
type (by default) - set
automommit
forResultSet
tofalse
value
Otherwise, postgre driver fetches all data at once.
To calculate how many times jdbc driver trip to the database you need to divide total rows on your fetch size
and round up. In my example calculation will be: 1_000_000 / 100_000 = 10
trips.
Try to play with different settings of Connection
and Statement
abstractions, you will see difference in
data set processing in logs.
Postgre sql driver, issuing a query - link