After reading about the difference between MySQL Connector/Python and MySQLdb on this blog post I wondered how the C Extension option in Connector/Python would perform.
If you want to run the code yourself you'll need: Jupyter/IPython, Python 3, Requests, MySQLdb, Connector/Python, Matplotlib, Pandas and MySQL.
%matplotlib notebook
import random
import gzip
import time
import pandas as pd
import matplotlib.pyplot as plt
import requests
import mysql.connector
import MySQLdb
for imp in [mysql.connector, MySQLdb]:
print('Using {imp} {version}'.format(imp=imp.__name__, version=imp.__version__))
print('C Extension for MySQL Connector/Python available: %s' % mysql.connector.HAVE_CEXT)
Make sure the C Extension is available. This needs MySQL Connector/Python 2.1 or newer. On Fedora you might need to install this with dnf install mysql-connector-python3-cext
if you have the mysql-connectors-community repository installed. If you compile from source then make sure to use the --with-mysql-capi
option.
worlddb_url = 'https://downloads.mysql.com/docs/world.sql.gz'
worlddb_req = requests.get(worlddb_url)
if worlddb_req.status_code == 200:
worldsql = gzip.decompress(worlddb_req.content).decode('iso-8859-15')
config = {
'host': '127.0.0.1',
'port': 5710,
'user': 'msandbox',
'passwd': 'msandbox',
}
The above is my config to connect to a MySQL Sandbox running MySQL Server 5.7.10.
Note: you might hit MySQL Bug #79780 when loading the world database into MySQL with Connector/Python with the C Extension enabled.
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
cur1.execute('DROP SCHEMA IF EXISTS world')
cur1.execute('CREATE SCHEMA world DEFAULT CHARACTER SET latin1')
cur1.execute('USE world')
result = [x for x in cur1.execute(worldsql, multi=True)]
cur1.close()
c1.close()
config['db'] = 'world'
perfdata = pd.DataFrame(columns=['connpy','connpy_cext','MySQLdb'], index=range(10000))
Now we're going to run 10000 queries with a random primary key between 1 and 8000. This does not use the C Extension as use_pure is set to True.
c1 = mysql.connector.connect(use_pure=True, **config)
cur1 = c1.cursor()
for it in range(10000):
city_id = random.randint(1,8000)
start = time.perf_counter()
cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
cur1.fetchone()
perfdata.ix[it]['connpy'] = time.perf_counter() - start
Next up is Connector/Python with the C Extension (use_pure=False and HAVE_CEXT indicates we have the C Extension available)
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
for it in range(10000):
city_id = random.randint(1,8000)
start = time.perf_counter()
cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
cur1.fetchone()
perfdata.ix[it]['connpy_cext'] = time.perf_counter() - start
And last, but not least, MySQLdb.
c2 = MySQLdb.connect(**config)
cur2 = c2.cursor()
for it in range(10000):
city_id = random.randint(1,8000)
start = time.perf_counter()
cur2.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
cur2.fetchone()
perfdata.ix[it]['MySQLdb'] = time.perf_counter() - start
Now let's have a look to what our data looks like
perfdata.head()
Now let's plot that
plt.style.use('ggplot')
plt.scatter(perfdata.index, perfdata.connpy, s=1, c='r',
label='Connector/Python Pure')
plt.scatter(perfdata.index, perfdata.connpy_cext, s=1, c='g',
label='Connector/Python C Ext')
plt.scatter(perfdata.index, perfdata.MySQLdb, s=1, c='b',
label='MySQLdb')
plt.ylim(ymin=0, ymax=0.001)
plt.xlim(xmin=0, xmax=10000)
plt.xlabel('Run #')
plt.ylabel('Runtime in seconds')
plt.legend()
The performance of MySQL Connector/Python 2.1 with the C Extension is much closer to MySQLdb.
There is one serious drawback of using the C Extension: Prepared statements are not yet supported.
But there is more that performance alone. I like the MySQL Connector/Python API over the MySQLdb API and this shows that Connector/Python is flexible and can be almost as fast.
The pure Python implementation also has advantages: Easier installation (no C compiler required) and the option to use alternative implementations like PyPy