I have spent some time researching on InnoDB row formats and InnoDB compression. This article is about things that I have learned.
InnoDB file format and row format There are 2 file formats, Antelope and Barracuda, with the latter being the newer. Antelope supports row formats Redundant and Compact, whereas Barracuda supports 2 more, Dynamic and Compressed.
Dynamic vs Compact Compact is the default row format in MySQL 5.6. The difference between Compact and Dynamic is the handling of BLOBs (including BLOB, TEXT, VARCHAR, etc).Background My gevent application hangs after issuing a query that yields 1000+ rows and then a COMMIT. After fixing mysqlclient (see Python mysqlclient Doesn’t Work Well with gevent), the COMMIT does not hang the whole application, but it still hangs the greenlet that issues the SQL query.
Setup: Ubuntu 16.04, ProxySQL v1.4.13, Amazon RDS MySQL 5.6 (ProxySQL backend), Python 2.7.12 application running on the same machine as ProxySQL
Symptoms It is always reproducible with the specific query, the backend, and on that machine.mysqlclient (MySQLdb fork) is a Python MySQL driver which uses libmysqlclient (or libmariadbclient). This means speed and stability. Does it work well with gevent? Not really.
In mysqlclient <= v1.3.13, there’s a sample called waiter_gevent.py which looks like this:
from __future__ import print_function """Demo using Gevent with mysqlclient.""" import gevent.hub import MySQLdb def gevent_waiter(fd, hub=gevent.hub.get_hub()): hub.wait(hub.loop.io(fd, 1)) def f(n): conn = MySQLdb.connect(user='root', waiter=gevent_waiter) cur = conn.cursor() cur.execute("SELECT SLEEP(%s)", (n,)) cur.execute("SELECT 1+%s", (n,)) print(cur.Update 21 Mar 2019 There’s a new post Fixing ProxySQL Idle Threads Epoll Hang Heisenbug about fixing a bug of ProxySQL when --idle-threads is enabled.
Problem ProxySQL does not scale with high numbers of connections by default. When there are tens of thousands of client connection to ProxySQL, the CPU usage of ProxySQL may go up to 300%+ because by default it uses 4 threads. Even with 3000 mostly idle connections, it hits ~70% CPU usage.Background: I have multiple 200GB+ MySQL dumps to import. It takes a lot of time.
pv shows you the progress so you have the idea how long will the import take. Pipe the dump.sql.gz file to gunzip instead of unzipping it then import. Saves much disk read and write, not to mention the time and effort. e.g. pv dump.sql.gz | gunzip -c | mysql grep is slow for a large file.I have recently migrated to Amazon RDS from MySQL on Azure VM.
This percona article about migrating to RDS is actually very helpful. I have a few things to add though.
Setup My setup is a self-hosted MySQL 5.6 server in VM on Azure and a new Amazon RDS for MySQL 5.6.
AWS DMS I have tried AWS Database Migration Service (DMS). It is reasonably fast. The main problem is it does not look like mature technology at all.Background My Python application cannot connect to MySQL. The error message looks like this:
Can’t connect to MySQL server on ‘127.0.0.1’ (4)
Error 4 means Interrupted system call.
I am using mysqlclient, the C wrapper MySQL connector. The error happens on both MySQL 5.6 and 5.7. It can be reproduced consistently. It seems that PyMySQL doesn’t have this problem. Also I am using gevent but it is not much related in this case.