Fixing Slow MySQL #1: Foreign Key & Row Lock Contention

Background

There is service downtime due to “Slow MySQL”. But why is it slow? And how to fix it? Let’s start digging.

This post is the first part of the series.

Database Schema

Here’s a minimal database schema to illustrate the issue.

CREATE TABLE parent (
    id INT NOT NULL AUTO_INCREMENT,
    child_count INT NOT NULL,
    PRIMARY KEY (id)
);

CREATE TABLE child (
    id INT NOT NULL AUTO_INCREMENT,
    parent_id INT,
    FOREIGN KEY (parent_id)
        REFERENCES parent(id)
        ON DELETE CASCADE,
    PRIMARY KEY (id)
);

Add example data to database.

INSERT INTO parent (child_count) VALUES (0);

Symptoms

  1. CPU usage is low as usual
  2. Disk IO is low as usual
  3. Memory usage is stable
  4. New Relic shows that INSERTs on child and UPDATEs on parent are 1 order of magnitude slower than other queries (1s vs 100ms). The queries are sql INSERT INTO child (parent_id) VALUES (1); UPDATE parent SET child_count = child_count + 1 WHERE id = 1;

Thought process

Usually when MySQL is slow, it is because of inadequate computational power (CPU), insufficient memory, or most likely, slow disks that cannot handle the large number of IOs. In this case, it is none of these obvious reasons. Something inside MySQL is slow. Since there are no monitoring tools (e.g. Percona Monitoring and Management) installed for the MySQL instance, we can only rely on queries like SHOW GLOBAL STATUS and SHOW ENGINE INNODB STATUS.

Observation

SHOW ENGINE INNODB STATUS shows the latest deadlocks around the parent and child tables. There are also reports in Sentry about the deadlocks. Apparently the INSERT and UPDATE queries above are related in some mysterious ways.

Hypothesis

According to MySQL documentation on InnoDB locks set,

UPDATE … WHERE … sets an exclusive next-key lock on every record the search encounters. However, only an index record lock is required for statements that lock rows using a unique index to search for a unique row.

If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete that requires the constraint condition to be checked sets shared record-level locks on the records that it looks at to check the constraint. InnoDB also sets these locks in the case where the constraint fails.

and MySQL documentation on InnoDB locking,

InnoDB implements standard row-level locking where there are two types of locks, shared (S) locks and exclusive (X) locks. A shared (S) lock permits the transaction that holds the lock to read a row. An exclusive (X) lock permits the transaction that holds the lock to update or delete a row.

Therefore,

  • UPDATE parent SET child_count = child_count + 1 WHERE id = 1; acquires exclusive (X) lock on parent clustered index record id = 1.
  • INSERT INTO child (parent_id) VALUES (1); acquires shared (S) lock on parent clustered index record id = 1.

They are competing locks on the same record.

Validation

In connection A:

BEGIN
UPDATE parent SET child_count = child_count + 1 WHERE id = 1;

Returns:

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

In connection B:

BEGIN
INSERT INTO child (parent_id) VALUES (1);

Connection B does not return and is waiting for the transaction of connection A to release the X lock.

In connection C:

SELECT * FROM information_schema.innodb_locks;

Returns:

+-------------+-------------+-----------+-----------+---------------+------------+------------+-----------+----------+-----------+
| lock_id     | lock_trx_id | lock_mode | lock_type | lock_table    | lock_index | lock_space | lock_page | lock_rec | lock_data |
+-------------+-------------+-----------+-----------+---------------+------------+------------+-----------+----------+-----------+
| 1812:24:3:2 | 1812        | S         | RECORD    | `db`.`parent` | PRIMARY    |         24 |         3 |        2 | 1         |
| 1814:24:3:2 | 1814        | X         | RECORD    | `db`.`parent` | PRIMARY    |         24 |         3 |        2 | 1         |
+-------------+-------------+-----------+-----------+---------------+------------+------------+-----------+----------+-----------+

This experiment shows that the UPDATE and INSERT queries are related because they compete for the same parent table row lock.

Full hypothesis about incident

When there are a lot of concurrent UPDATEs on parent and INSERTs on child for the same parent id, the parent row is hot and there will be a lock contention problem for that record. This will slow down the UPDATEs and INSERTs and block the backend application.

Assuming that the backend application uses a database connection pool and also a thread pool for the applcation itself, the slow queries will slowly exhaust the pools and bring down the whole backend application at last.

Solution

We can move parent.child_count to another table, or batch the UPDATEs, or remove the UPDATEs altogether. These will remove or reduce the need to acquire the X lock on parent record, and will therefore fix the lock contention issue.

On a side note, updating a child table’s counter on a parent table doesn’t sound right anyway. There are better ways to get statistics on child counter depending on business requirements.

Outcome

This is in fact the correct diagnosis and an effective fix for the MySQL slowness. The UPDATEs on parent and INSERTs on child are no longer an order of magnitude slower than other queries. Deadlocks are gone.

However, even with the fix, the backend application still goes down but this time with different symptoms. See Fixing Slow MySQL #2.

Tags// , ,