Fixing Slow MySQL #2: Query with Function on Column
10/Jun 2021Background
This is the 2nd part of the Fixing Slow MySQL series. If you haven’t seen the 1st part, here it is: Fixing Slow MySQL #1: Foreign Key & Row Lock Contention.
Despite the successful fix in part 1, the database is still slow and the application goes down as load goes up, but the symptoms are different from the previous time. It means we hit another bottleneck.
Database Schema
Here’s a minimal database schema to illustrate the issue.
CREATE TABLE user (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
nickname VARCHAR(10),
PRIMARY KEY (id),
INDEX ix_name (name)
);
Add example data to database.
INSERT INTO user (name) VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g');
Symptoms
- There is a spike from 20% to 80% in CPU usage when load increases.
- Disk IO is low as usual
- Memory usage is stable
- New Relic shows that all queries are slow, even for Primary Key reads. e.g.
SELECT * FROM user WHERE id=1;
Thought process
Given that the working set is small and is entirely in memory, i.e. table size < buffer pool size, which is confirmed by the low IO, the slow PK (Primary Key) reads are extremely weird.
A possible explanation of all the slow queries is that the CPU is too busy that MySQL cannot even respond quickly to simple queries like PK reads.
Now we have to find out why is the CPU so busy.
Observation
SHOW ENGINE INNODB STATUS
is showing ~4,000,000 row reads per second, which is dangerously high.
This time in New Relic the slow queries are all over the place. It takes some time to find suspicious queries, but here’s one that catches my eye:
SELECT id,name FROM user WHERE LOWER(name)='e';
MySQL uses case insensitive collation by default. And in this case the name
column is clearly using a _ci
case insensitive collation. Therefore, the LOWER(name)
in WHERE
is redundant. Notice that there is an index ix_name
on name
, and while WHERE name='e'
can surely utilize the index, does the funny-looking WHERE LOWER(name)='e'
use the index?
Hypothesis
The query above is poorly written. The LOWER
call does not provide any value and does not properly utilize an index to speed up the search. This poorly written query is exhausting CPU resources.
Validation
Index usage
Let’s check if the query uses the index:
mysql> EXPLAIN SELECT id,name FROM user WHERE name='e';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | ref | ix_name | ix_name | 32 | const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT id,name FROM user WHERE LOWER(name)='e';
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | user | NULL | index | NULL | ix_name | 32 | NULL | 7 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
Surprisingly, the query with LOWER
is Using where; Using index
. But don’t be fooled.
According to MySQL Documentation EXPLAIN Output Format:
- rows
> Estimate of rows to be examined
- possible_keys
> If this column is NULL, there are no relevant indexes.
- key
> It is possible for key to name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.
Notice that the possible keys is NULL
while the key is ix_name
, and rows
is 7 instead of 1.
Although the query with LOWER
is Using where; Using index
, it does not use the index to aid the lookup. It is using the index as a covering index and performing a full scan on this secondary index instead of the primary clustered index. It is actually a full table scan equivalent but on a secondary index with one fewer column in this case, which is known as full index scan. MySQL thinks it will save IO by scanning a smaller index.
The fact that the bad query is running a full index scan can be confirmed by selecting a column that is not covered by the index.
mysql> EXPLAIN SELECT * FROM user WHERE LOWER(name)='e';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
This time key
is NULL and a full table scan is performed instead of a full index scan. Either way, with the LOWER
function on column, the lookup is not efficient. The query does not scale when the number of rows increases.
Redundant LOWER function
Is the LOWER
useful at all?
mysql> SELECT * FROM user WHERE name='e';
+----+------+----------+
| id | name | nickname |
+----+------+----------+
| 5 | e | NULL |
+----+------+----------+
1 row in set (0.00 sec)
mysql> SELECT * FROM user WHERE name='E';
+----+------+----------+
| id | name | nickname |
+----+------+----------+
| 5 | e | NULL |
+----+------+----------+
1 row in set (0.00 sec)
mysql> SELECT * FROM user WHERE LOWER(name)='E';
+----+------+----------+
| id | name | nickname |
+----+------+----------+
| 5 | e | NULL |
+----+------+----------+
1 row in set (0.00 sec)
It makes no difference.
Math about row read
There are 40k rows in user
table and there are 100 QPS (queries per second) of this problematic query, then there are 40000*100 = 4M rows examined every second, which matches the 4M row reads per second in InnoDB status.
Full hypothesis about incident
There are 4M rows examined per second and for each row the LOWER function is called.
The CPU is busy performing this wasteful work instead of serving other queries. All queries are slowed by the same extent due to busy CPU and they do not return to the backend application quickly enough. At last, the application is no longer available because all database connections are waiting for the slow queries.
Solution
Considering that the LOWER
function does nothing other than messing up the performance, a straightforward fix is to remove the function.
Outcome
After removing the LOWER
function on column in the query, the row reads/sec dropped by 95% and MySQL CPU usage reduced by 74%. It is an obvious win.