Fixing Slow MySQL #1: Foreign Key & Row Lock Contention
28/May 2021Background
There is service downtime due to “Slow MySQL”. But why is it slow? And how to fix it? Let’s start digging.
This post is the first part of the series.
Database Schema
Here’s a minimal database schema to illustrate the issue.
CREATE TABLE parent (
id INT NOT NULL AUTO_INCREMENT,
child_count INT NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE child (
id INT NOT NULL AUTO_INCREMENT,
parent_id INT,
FOREIGN KEY (parent_id)
REFERENCES parent(id)
ON DELETE CASCADE,
PRIMARY KEY (id)
);
Add example data to database.
INSERT INTO parent (child_count) VALUES (0);
Symptoms
- CPU usage is low as usual
- Disk IO is low as usual
- Memory usage is stable
- New Relic shows that
INSERT
s on child andUPDATE
s on parent are 1 order of magnitude slower than other queries (1s vs 100ms). The queries aresql INSERT INTO child (parent_id) VALUES (1); UPDATE parent SET child_count = child_count + 1 WHERE id = 1;
Thought process
Usually when MySQL is slow, it is because of inadequate computational power (CPU), insufficient memory, or most likely, slow disks that cannot handle the large number of IOs. In this case, it is none of these obvious reasons. Something inside MySQL is slow. Since there are no monitoring tools (e.g. Percona Monitoring and Management) installed for the MySQL instance, we can only rely on queries like SHOW GLOBAL STATUS
and SHOW ENGINE INNODB STATUS
.
Observation
SHOW ENGINE INNODB STATUS
shows the latest deadlocks around the parent and child tables. There are also reports in Sentry about the deadlocks. Apparently the INSERT
and UPDATE
queries above are related in some mysterious ways.
Hypothesis
According to MySQL documentation on InnoDB locks set,
UPDATE … WHERE … sets an exclusive next-key lock on every record the search encounters. However, only an index record lock is required for statements that lock rows using a unique index to search for a unique row.
If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete that requires the constraint condition to be checked sets shared record-level locks on the records that it looks at to check the constraint. InnoDB also sets these locks in the case where the constraint fails.
and MySQL documentation on InnoDB locking,
InnoDB implements standard row-level locking where there are two types of locks, shared (S) locks and exclusive (X) locks. A shared (S) lock permits the transaction that holds the lock to read a row. An exclusive (X) lock permits the transaction that holds the lock to update or delete a row.
Therefore,
UPDATE parent SET child_count = child_count + 1 WHERE id = 1;
acquires exclusive (X) lock onparent
clustered index record id = 1.INSERT INTO child (parent_id) VALUES (1);
acquires shared (S) lock onparent
clustered index record id = 1.
They are competing locks on the same record.
Validation
In connection A:
BEGIN
UPDATE parent SET child_count = child_count + 1 WHERE id = 1;
Returns:
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
In connection B:
BEGIN
INSERT INTO child (parent_id) VALUES (1);
Connection B does not return and is waiting for the transaction of connection A to release the X lock.
In connection C:
SELECT * FROM information_schema.innodb_locks;
Returns:
+-------------+-------------+-----------+-----------+---------------+------------+------------+-----------+----------+-----------+
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+-------------+-------------+-----------+-----------+---------------+------------+------------+-----------+----------+-----------+
| 1812:24:3:2 | 1812 | S | RECORD | `db`.`parent` | PRIMARY | 24 | 3 | 2 | 1 |
| 1814:24:3:2 | 1814 | X | RECORD | `db`.`parent` | PRIMARY | 24 | 3 | 2 | 1 |
+-------------+-------------+-----------+-----------+---------------+------------+------------+-----------+----------+-----------+
This experiment shows that the UPDATE
and INSERT
queries are related because they compete for the same parent
table row lock.
Full hypothesis about incident
When there are a lot of concurrent UPDATE
s on parent and INSERT
s on child for the same parent id, the parent row is hot and there will be a lock contention problem for that record. This will slow down the UPDATE
s and INSERT
s and block the backend application.
Assuming that the backend application uses a database connection pool and also a thread pool for the applcation itself, the slow queries will slowly exhaust the pools and bring down the whole backend application at last.
Solution
We can move parent.child_count
to another table, or batch the UPDATE
s, or remove the UPDATE
s altogether. These will remove or reduce the need to acquire the X lock on parent record, and will therefore fix the lock contention issue.
On a side note, updating a child table’s counter on a parent table doesn’t sound right anyway. There are better ways to get statistics on child counter depending on business requirements.
Outcome
This is in fact the correct diagnosis and an effective fix for the MySQL slowness. The UPDATE
s on parent and INSERT
s on child are no longer an order of magnitude slower than other queries. Deadlocks are gone.
However, even with the fix, the backend application still goes down but this time with different symptoms. See Fixing Slow MySQL #2.