Tutorials/Upserts

Upserts without the workarounds

Insert if missing, update if present: one of the most common write patterns, and one JPA never got a first-class answer for. Storm delegates it to the database, where it is atomic.

Series · JPA to Storm5 min readKotlin

01The task

A webhook delivers user records, possibly more than once. An import job synchronizes rows from an external system. In both cases the same logical record may arrive repeatedly, and the write must be idempotent: create the row the first time, update it every time after, and survive two deliveries arriving at once.

02The JPA way

JPA has no upsert operation. merge() sounds like one but is not: it copies detached state into the session and still decides insert-or-update by looking first. So most codebases end up with this:

ImportService.kt Kotlin · JPA Show SQL
1
2
3
4
5
6
7
8
9
10
11
// The common JPA workaround: check, then write. Two statements, one race.
@Transactional
fun importUser(email: String, name: String, city: City) {
    val existing = userRepository.findByEmail(email)
    if (existing == null) {
        userRepository.save(User(email = email, name = name, city = city))
    } else {
        existing.name = name
        existing.city = city
    }
}
generated sql
-- two statements with a gap between them
SELECT u.* FROM user u WHERE u.email = ?
INSERT INTO user (email, name, city_id) VALUES (?, ?, ?)

-- two concurrent imports both see "no row" in the gap:
-- one insert succeeds, the other dies on the unique constraint

The check and the write are separate statements, so two concurrent requests can both find nothing and both insert. One wins, the other throws DataIntegrityViolationException, and now you are writing retry logic or taking pessimistic locks for a write that the database could have resolved on its own. The alternative is a hand-written native query per database dialect.

03The Storm way

Storm treats upsert as a first-class repository operation and generates the native conflict syntax of whatever database you are on:

ImportService.kt Kotlin · Storm Show SQL
1
2
3
4
5
6
7
// One statement, resolved by the database, safe under concurrency
val user = orm upsert User(
    email = "alice@example.com",
    name = "Alice",
    city = city,
)
// user.id is populated whether the row was inserted or updated
generated sql
-- PostgreSQL dialect
INSERT INTO "user" (email, name, city_id) VALUES (?, ?, ?)
ON CONFLICT (email) DO UPDATE
SET name = excluded.name, city_id = excluded.city_id

-- MySQL/MariaDB: INSERT ... ON DUPLICATE KEY UPDATE
-- Oracle, SQL Server, H2: MERGE INTO ...

The database resolves the conflict atomically in one statement: no race window, no retry logic, no application-level locking. Conflict detection rides on the table's unique constraints, in this case the unique index on email. The returned entity carries the database-generated id either way.

Upsert needs two things: the dialect module for your database on the classpath (for example storm-postgresql), and a primary key or unique constraint for the database to detect the conflict. On MySQL and MariaDB, any unique constraint can trigger the update branch, so be deliberate when a table has several. See Upserts for the failure modes.

04Batches

Synchronization jobs rarely process one record. Passing a list combines JDBC batching with the native upsert syntax, which is dramatically faster than a loop:

SyncJob.kt Kotlin · Storm
1
2
3
4
5
6
// Synchronize an external source: insert the new, update the known
val users = imported.map { row ->
    User(email = row.email, name = row.name, city = resolveCity(row.city))
}

orm upsert users   // one batched statement, not a loop

05Inside a transaction

Upserts participate in transactions like every other write, so multi-step imports stay atomic:

ImportService.kt Kotlin · Storm
1
2
3
4
5
// Upserts compose with transactions like any other write
transaction {
    val city = orm insert City(name = "Sunnyvale", population = 161_884)
    val user = orm upsert User(email = "alice@example.com", name = "Alice", city = city)
}

06Side by side

JPA with HibernateStorm
Upsert operationNone; find-then-save or a native query per dialectorm upsert entity, one statement
ConcurrencyRace window between check and writeAtomic; the database resolves the conflict
PortabilityNative SQL differs per databaseDialects generate ON CONFLICT, ON DUPLICATE KEY, or MERGE
BatchesLoop over single writesOne batched upsert statement

07Keep going

The reference documentation covers the mechanics in depth: