Support Channel. “Hi, I am getting deadlocks in the database and they occur when I have to rollback the transactions but if we don’t have to roll back all transactions get executed.” Wait, what? After some back and forth it becomes clear that the Dev experiences deadlocks and has data:
and that is weird because of the
lock mode S locks gap in the last line. We get the exact same statement with the exact same value on the second thread, but with
lock mode X locks gap.
Both transactions have an undo log entry of the length 1 - one row, single insert and the insert has an S-lock.
A mystery INSERT and opaque code
Many questions arise:
- how can an
INSERThave an S-lock?
- how can a single
- what does the originating code look like?
The last question can be actually answered by the developer, but because they are using Java, in true Java fashion it is almost - but not quite - useless to a database person.
So, where is the SQL?
This is often a problem - Object Relational Mappers encapsulate the things that go on in the database so much that it is really hard for anybody - Developers, DBAs and everybody else - to understand what actually happens and make debugging quite painful.
Or, if they understand what goes on with the database, to map this to the code.
TRANSACTION ISOLATION LEVEL SERIALIZABLE
In this case it is solvable, though. The
isolation = Isolation.SERIALIZABLE is the culprit here.
So when we spoke about transactions and isolation levels previously, I made the decision to leave the fourth and most useless isolation level out of the picture:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE. The manual says:
This level is like
REPEATABLE READ, but InnoDB implicitly converts all plain
SELECT ... FOR SHAREif autocommit is disabled.
It then goes on to explain how
SERIALIZABLE does nothing when there is no explicit transaction going on. It does not explain what it is good for (mostly: shooting yourself into the foot) and when you should use it (mostly: don’t).
It does answer the question of “Where to the S-Locks come from?”, though.
SERIALIZABLE isolation mode turns a normal
SELECT statement into a Medusa’s freeze ray that shoots S-Locks all over the tables onto everything it looks at, preventing other threads from changing these things until we end our transaction and drop our locks (And that is why you should not use it, and why I personally believe that your code is broken if it needs it).
A broken RMW and lock escalation
So instead of a regular Read-Modify-Write
we get the following broken Read-Modify-Write, minimum:
LOCK IN SHARE MODE or equivalent
FOR SHARE is not in the code, it is added implicitly by the isolation level
We get an S-Lock, which is not good for writing. Our transaction now did not get the required locks necessary for reading at the start of the transaction, because the later
INSERT requires an X-lock, like any write statement would. The database needs to aquire the X-lock, that is, it needs to upgrade the S-lock to an X-lock.
If at that point in time another threads tries to run the exact same statement, which is what happens here, they already hold a second S-lock, preventing the first thread from completing their transaction (it is waiting until the second threads drops the S-lock or it times out).
And then that second thread also tries to upgrade their S-lock into an X-lock, which it can’t do, because that first thread is trying to do the same thing, and we have the deadlock and a rollback.
Reproduction of the problem
We can easily reproduce this.
We change the isolation level to
SERIALIZABLE and start a transaction (because, as stated in the manual, autocommit does nothing). We then simply look at a single row, and check
PERFORMANCE_SCHEMA.DATA_LOCKS afterwards. Lo and behold, S-Locks as promised by the manual.
Now, the setup for the deadlock with a second session, by doing the same thing:
Checking the data_locks table we now see two sets of IS- and S-Locks belonging to two different threads. We go for an
UPDATE here, because we chose existing rows and row locks, instead of non-existing rows and gap locks:
and in the other connection:
Coming back to the first session, this now reads
The timing given is the time I took to switch between terminals and to type the commands.
Coming back to the support case, the Dev analyzed their code and found out that what they are emitting is actually the sequence
so their code is already almost correct.
They do not need the double read and also not the isolation level
SERIALIZABLE. This is an easy fix for them and the deadlocks are gone, the world is safe again.
So many things to learn from this:
- You won’t need
SERIALIZABLEunless your code is broken. Trying to use it is a warning sign.
- A deadlock with an S-lock escalation means you need to check the isolation level.
SERIALIZABLEit is totally possible to deadlock yourself with a simple invisible
SELECTand a lone
- The ORM will remove you quite a lot from the emitted SQL. Do you know how to trace your ORM and to get the actual SQL generated? If not, go and find out.
- A server side trace will not save you - the server is a busy beast.
- It also cannot see your stackframes, so it can’t link your SQL to the line in your code that called the ORM. Yes, in the client side SQL trace, ideally you also want the tracer to bubble up the stack and give you the first line outside of the ORM to identify what is causing the SQL to be emitted and where in the code that happens.
- The deadlock information in
SHOW ENGINE INNODB STATUSis painfully opaque, but learning to read it is worthwhile.
- In reproduction, using performance schema is much easier and makes the sequence of events much easier to understand.
- The server is not very good at explaining the root cause of deadlocks to a developer in the error messages and warnings generated.