MySQL Deadlocks with INSERT

Support Channel. “Hi, I am getting deadlocks in the database, and they occur when I have to roll back the transactions but if we don’t have to roll back all transactions get executed.” Wait, what? After some back and forth it becomes clear that the Dev experiences deadlocks and has data:
mysql> pager less
mysql> show engine innodb status\G
...
MySQL thread id 142531, OS thread handle 139990258222848, query id 4799571
somehost.somedomain someuser update
INSERT into sometable (identifier_id, currency, balance ) VALUES ('d4e84cb1-4d56-4d67-9d16-1d548fd26b55', 'EUR', '0')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 3523 page no 1106463 n bits 224 index PRIMARY of table `somedb`.`sometable` trx id 9843342279 lock mode S locks gap before rec
and that is weird because of the lock mode S locks gap
in the last line. We get the exact same statement with the exact same value on the second thread, but with lock mode X locks gap
.
Both transactions have an undo-log entry of the length 1 - one row, single insert and the insert has an S-lock.
A mystery INSERT and opaque code
Many questions arise:
- how can an
INSERT
have an S-lock? - how can a single
INSERT
transaction deadlock? - what does the originating code look like?
The last question can be actually answered by the developer, but because they are using Java, in true Java fashion it is almost - but not quite - useless to a database person.
@Transactional(propagation = Propagation.REQUIRES_NEW,
timeout = MYSQL_TRANSACTION_TIMEOUT,
rollbackFor = {
BucketNotFoundException.class,
DuplicateTransactionException.class,
BucketBalanceUpdateException.class
},
isolation = Isolation.SERIALIZABLE
)
public void initiateBucketBalanceUpdate(Transaction transaction)
throws BucketBalanceUpdateException,
DuplicateTransactionException {
this.validateAndInsertIdempotencyKey(transaction);
this.executeBucketBalanceUpdateFlow(transaction);
this.saveTransactionEntries(transaction);
}
So, where is the SQL?
This is often a problem - Object Relational Mappers encapsulate the things that go on in the database so much that it is really hard for anybody - Developers, DBAs and everybody else - to understand what actually happens and make debugging quite painful.
Or, if they understand what goes on with the database, to map this to the code.
TRANSACTION ISOLATION LEVEL SERIALIZABLE
In this case it is solvable, though. The isolation = Isolation.SERIALIZABLE
is the culprit here.
So when we spoke about transactions and isolation levels
previously, I made the decision to leave the fourth and most useless isolation level out of the picture: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
. The manual says
:
SERIALIZABLE
This level is like
REPEATABLE READ
, but InnoDB implicitly converts all plainSELECT
statements toSELECT ... FOR SHARE
if autocommit is disabled.
It then goes on to explain how SERIALIZABLE
does nothing when there is no explicit transaction going on. It does not explain what it is good for (mostly: shooting yourself into the foot) and when you should use it (mostly: don’t).
It does answer the question of “Where to the S-Locks come from?”, though.
The SERIALIZABLE
isolation mode turns a normal SELECT
statement into a Medusa’s freeze ray that shoots S-Locks all over the tables onto everything it looks at, preventing other threads from changing these things until we end our transaction and drop our locks (And that is why you should not use it, and why I personally believe that your code is broken if it needs it).
A broken RMW and lock escalation
So instead of a regular Read-Modify-Write
Session1> START TRANSACTION READ WRITE;
Session1> SELECT * FROM sometable WHERE id=10 FOR UPDATE; -- X-lock granted on rec or gap
-- ... Application decides INSERT or UPDATE
Session1> INSERT INTO sometable (id, ...) VALUES ( 10, ... );
Session1> COMMIT;
we get the following broken Read-Modify-Write, minimum:
Session1> START TRANSACTION READ WRITE;
Session1> SELECT * FROM sometable WHERE id=10 FOR SHARE; -- S-lock granted on rec or gap
-- ... Application decides INSERT or UPDATE
Session1> INSERT INTO sometable (id, ...) VALUES ( 10, ... ); -- lock escalation to X
Session1> COMMIT;
The LOCK IN SHARE MODE
or equivalent FOR SHARE
is not in the code, it is added implicitly by the isolation level SERIALIZABLE
.
We get an S-Lock, which is not good for writing.
Our transaction now did not get the required locks necessary for reading at the start of the transaction, because the later INSERT
requires an X-lock, like any write statement would.
The database needs to acquire the X-lock, that is, it needs to upgrade the S-lock to an X-lock.
If at that point in time another threads tries to run the exact same statement, which is what happens here, they already hold a second S-lock, preventing the first thread from completing their transaction (it is waiting until the second threads drops the S-lock, or it times out).
And then that second thread also tries to upgrade their S-lock into an X-lock, which it can’t do, because that first thread is trying to do the same thing, and we have the deadlock and a rollback.
Reproduction of the problem
We can easily reproduce this.
Session1> set transaction isolation level serializable;
Session1> start transaction read write;
Query OK, 0 rows affected (0.00 sec)
Session1> select * from kris where id = 10;
+----+-------+
| id | value |
+----+-------+
| 10 | 10 |
+----+-------+
Session1> select * from performance_schema.data_locks\G
...
LOCK_TYPE: TABLE
LOCK_MODE: IS
LOCK_STATUS: GRANTED
LOCK_DATA: NULL
...
LOCK_TYPE: RECORD
LOCK_MODE: S,REC_NOT_GAP
LOCK_STATUS: GRANTED
LOCK_DATA: 10
...
Session1> update kris set value=11 where id =10;
We change the isolation level to SERIALIZABLE
and start a transaction (because, as stated in the manual, autocommit does nothing). We then simply look at a single row, and check PERFORMANCE_SCHEMA.DATA_LOCKS
afterwards. Lo and behold, S-Locks as promised by the manual.
Now, the setup for the deadlock with a second session, by doing the same thing:
Session2> set transaction isolation level serializable;
Session2> start transaction read write;
Query OK, 0 rows affected (0.00 sec)
Session2> select * from kris where id = 10;
+----+-------+
| id | value |
+----+-------+
| 10 | 10 |
+----+-------+
Checking the data_locks table we now see two sets of IS- and S-Locks belonging to two different threads. We go for an UPDATE
here, because we chose existing rows and row locks, instead of non-existing rows and gap locks:
Session1> update kris set value=11 where id =10;
... hangs ...
and in the other connection:
Session2> update kris set value=13 where id =10;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Coming back to the first session, this now reads
Session1> update kris set value=11 where id =10;
... hangs ...
Query OK, 1 row affected (2.43 sec)
Rows matched: 1 Changed: 1 Warnings: 0
The timing given is the time I took to switch between terminals and to type the commands.
Resolution
Coming back to the support case, the Dev analyzed their code and found out that what they are emitting is actually the sequence
Session1> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Session1> START TRANSACTION READ WRITE;
Session1> SELECT * FROM sometable WHERE id=10; -- implied S-lock granted on rec or gap
-- ... Application decides INSERT or UPDATE
Session1> SELECT * FROM sometable WHERE id=10 FOR UPDATEl -- lock escalation to X
Session1> INSERT INTO sometable (id, ...) VALUES ( 10, ... );
Session1> COMMIT;
so their code is already almost correct.
They do not need the double read and also not the isolation level SERIALIZABLE
. This is an easy fix for them and the deadlocks are gone, the world is safe again.
So many things to learn from this:
- You won’t need
SERIALIZABLE
unless your code is broken. Trying to use it is a warning sign. - A deadlock with an S-lock escalation means you need to check the isolation level.
- In
SERIALIZABLE
it is totally possible to deadlock yourself with a simple invisibleSELECT
and a loneINSERT
orUPDATE
.
- In
- The ORM will remove you quite a lot from the emitted SQL. Do you know how to trace your ORM and to get the actual SQL generated? If not, go and find out.
- A server side trace will not save you - the server is a busy beast.
- It also cannot see your stackframes, so it can’t link your SQL to the line in your code that called the ORM. Yes, in the client side SQL trace, ideally you also want the tracer to bubble up the stack and give you the first line outside the ORM to identify what is causing the SQL to be emitted and where in the code that happens.
- The deadlock information in
SHOW ENGINE INNODB STATUS
is painfully opaque, but learning to read it is worthwhile.- In reproduction, using performance schema is much easier and makes the sequence of events much easier to understand.
- The server is not very good at explaining the root cause of deadlocks to a developer in the error messages and warnings generated.