Contents

Fixing Slow MySQL #2: Query with Function on Column

¹⁰/Jun 2021

Background

This is the 2nd part of the Fixing Slow MySQL series. If you haven’t seen the 1st part, here it is: Fixing Slow MySQL #1: Foreign Key & Row Lock Contention.

Despite the successful fix in part 1, the database is still slow and the application goes down as load goes up, but the symptoms are different from the previous time. It means we hit another bottleneck.

Database Schema

Here’s a minimal database schema to illustrate the issue.

CREATE TABLE user (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
    nickname VARCHAR(10),
    PRIMARY KEY (id),
    INDEX ix_name (name)
);

Add example data to database.

INSERT INTO user (name) VALUES ('a'),('b'),('c'),('d'),('e'),('f'),('g');

Symptoms

There is a spike from 20% to 80% in CPU usage when load increases.
Disk IO is low as usual
Memory usage is stable
New Relic shows that all queries are slow, even for Primary Key reads. e.g. SELECT * FROM user WHERE id=1;

Thought process

Given that the working set is small and is entirely in memory, i.e. table size < buffer pool size, which is confirmed by the low IO, the slow PK (Primary Key) reads are extremely weird.

A possible explanation of all the slow queries is that the CPU is too busy that MySQL cannot even respond quickly to simple queries like PK reads.

Now we have to find out why is the CPU so busy.

Observation

SHOW ENGINE INNODB STATUS is showing ~4,000,000 row reads per second, which is dangerously high.

This time in New Relic the slow queries are all over the place. It takes some time to find suspicious queries, but here’s one that catches my eye:

SELECT id,name FROM user WHERE LOWER(name)='e';

MySQL uses case insensitive collation by default. And in this case the name column is clearly using a _ci case insensitive collation. Therefore, the LOWER(name) in WHERE is redundant. Notice that there is an index ix_name on name, and while WHERE name='e' can surely utilize the index, does the funny-looking WHERE LOWER(name)='e' use the index?

Hypothesis

The query above is poorly written. The LOWER call does not provide any value and does not properly utilize an index to speed up the search. This poorly written query is exhausting CPU resources.

Validation

Index usage

Let’s check if the query uses the index:

mysql> EXPLAIN SELECT id,name FROM user WHERE name='e';
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | user  | NULL       | ref  | ix_name       | ix_name | 32      | const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT id,name FROM user WHERE LOWER(name)='e';
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | user  | NULL       | index | NULL          | ix_name | 32      | NULL |    7 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

Surprisingly, the query with LOWER is Using where; Using index. But don’t be fooled.

According to MySQL Documentation EXPLAIN Output Format: - rows > Estimate of rows to be examined - possible_keys > If this column is NULL, there are no relevant indexes. - key > It is possible for key to name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.

Notice that the possible keys is NULL while the key is ix_name, and rows is 7 instead of 1.

Although the query with LOWER is Using where; Using index, it does not use the index to aid the lookup. It is using the index as a covering index and performing a full scan on this secondary index instead of the primary clustered index. It is actually a full table scan equivalent but on a secondary index with one fewer column in this case, which is known as full index scan. MySQL thinks it will save IO by scanning a smaller index.

The fact that the bad query is running a full index scan can be confirmed by selecting a column that is not covered by the index.

mysql> EXPLAIN SELECT * FROM user WHERE LOWER(name)='e';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | user  | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    7 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

This time key is NULL and a full table scan is performed instead of a full index scan. Either way, with the LOWER function on column, the lookup is not efficient. The query does not scale when the number of rows increases.

Redundant LOWER function

Is the LOWER useful at all?

mysql> SELECT * FROM user WHERE name='e';
+----+------+----------+
| id | name | nickname |
+----+------+----------+
|  5 | e    | NULL     |
+----+------+----------+
1 row in set (0.00 sec)

mysql> SELECT * FROM user WHERE name='E';
+----+------+----------+
| id | name | nickname |
+----+------+----------+
|  5 | e    | NULL     |
+----+------+----------+
1 row in set (0.00 sec)

mysql> SELECT * FROM user WHERE LOWER(name)='E';
+----+------+----------+
| id | name | nickname |
+----+------+----------+
|  5 | e    | NULL     |
+----+------+----------+
1 row in set (0.00 sec)

It makes no difference.

Math about row read

There are 40k rows in user table and there are 100 QPS (queries per second) of this problematic query, then there are 40000*100 = 4M rows examined every second, which matches the 4M row reads per second in InnoDB status.

Full hypothesis about incident

There are 4M rows examined per second and for each row the LOWER function is called.

The CPU is busy performing this wasteful work instead of serving other queries. All queries are slowed by the same extent due to busy CPU and they do not return to the backend application quickly enough. At last, the application is no longer available because all database connections are waiting for the slow queries.

Solution

Considering that the LOWER function does nothing other than messing up the performance, a straightforward fix is to remove the function.

Outcome

After removing the LOWER function on column in the query, the row reads/sec dropped by 95% and MySQL CPU usage reduced by 74%. It is an obvious win.

Tags// mysql, datebase,

Overthought

Carson Ip