Saturday, December 26, 2015

The performance of MySQL Connector/Python with C Extension

The source of this post is in this gist on nbviewer.

After reading about the difference between MySQL Connector/Python and MySQLdb on this blog post I wondered how the C Extension option in Connector/Python would perform.

If you want to run the code yourself you'll need: Jupyter/IPython, Python 3, Requests, MySQLdb, Connector/Python, Matplotlib, Pandas and MySQL.

In [1]:
%matplotlib notebook
In [2]:
import random
import gzip
import time

import pandas as pd
import matplotlib.pyplot as plt
import requests
import mysql.connector
import MySQLdb
for imp in [mysql.connector, MySQLdb]:
    print('Using {imp} {version}'.format(imp=imp.__name__, version=imp.__version__))
print('C Extension for MySQL Connector/Python available: %s' % mysql.connector.HAVE_CEXT)
Using mysql.connector 2.1.3
Using MySQLdb 1.3.7
C Extension for MySQL Connector/Python available: True

Make sure the C Extension is available. This needs MySQL Connector/Python 2.1 or newer. On Fedora you might need to install this with dnf install mysql-connector-python3-cext if you have the mysql-connectors-community repository installed. If you compile from source then make sure to use the --with-mysql-capi option.

In [3]:
worlddb_url = 'https://downloads.mysql.com/docs/world.sql.gz'
worlddb_req = requests.get(worlddb_url)
if worlddb_req.status_code == 200:
    worldsql = gzip.decompress(worlddb_req.content).decode('iso-8859-15')
In [4]:
config = {
    'host': '127.0.0.1',
    'port': 5710,
    'user': 'msandbox',
    'passwd': 'msandbox',
}

The above is my config to connect to a MySQL Sandbox running MySQL Server 5.7.10.

Note: you might hit MySQL Bug #79780 when loading the world database into MySQL with Connector/Python with the C Extension enabled.

In [5]:
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
cur1.execute('DROP SCHEMA IF EXISTS world')
cur1.execute('CREATE SCHEMA world DEFAULT CHARACTER SET latin1')
cur1.execute('USE world')
result = [x for x in cur1.execute(worldsql, multi=True)]
cur1.close()
c1.close()
In [6]:
config['db'] = 'world'
In [7]:
perfdata = pd.DataFrame(columns=['connpy','connpy_cext','MySQLdb'], index=range(10000))

Now we're going to run 10000 queries with a random primary key between 1 and 8000. This does not use the C Extension as use_pure is set to True.

In [8]:
c1 = mysql.connector.connect(use_pure=True, **config)
cur1 = c1.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur1.fetchone()
    perfdata.ix[it]['connpy'] = time.perf_counter() - start

Next up is Connector/Python with the C Extension (use_pure=False and HAVE_CEXT indicates we have the C Extension available)

In [9]:
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur1.fetchone()
    perfdata.ix[it]['connpy_cext'] = time.perf_counter() - start

And last, but not least, MySQLdb.

In [10]:
c2 = MySQLdb.connect(**config)
cur2 = c2.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur2.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur2.fetchone()
    perfdata.ix[it]['MySQLdb'] = time.perf_counter() - start

Now let's have a look to what our data looks like

In [11]:
perfdata.head()
Out[11]:
connpy connpy_cext MySQLdb
0 0.00145918 0.000354935 0.000353173
1 0.000907707 0.000243508 0.000249597
2 0.000468397 0.000277101 0.000207893
3 0.000595066 0.000241349 0.00020754
4 0.000641848 0.000258027 0.000193182

Now let's plot that

In [12]:
plt.style.use('ggplot')
plt.scatter(perfdata.index, perfdata.connpy, s=1, c='r',
            label='Connector/Python Pure')
plt.scatter(perfdata.index, perfdata.connpy_cext, s=1, c='g',
            label='Connector/Python C Ext')
plt.scatter(perfdata.index, perfdata.MySQLdb, s=1, c='b',
            label='MySQLdb')
plt.ylim(ymin=0, ymax=0.001)
plt.xlim(xmin=0, xmax=10000)
plt.xlabel('Run #')
plt.ylabel('Runtime in seconds')
plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0x7f47bd8c2518>

The performance of MySQL Connector/Python 2.1 with the C Extension is much closer to MySQLdb.

There is one serious drawback of using the C Extension: Prepared statements are not yet supported.

But there is more that performance alone. I like the MySQL Connector/Python API over the MySQLdb API and this shows that Connector/Python is flexible and can be almost as fast.

The pure Python implementation also has advantages: Easier installation (no C compiler required) and the option to use alternative implementations like PyPy

2 comments:

  1. Our Google Dictionary and Google Translate extension can do no help on this situation but you can solve this problem by making your Flashblock extension not block the Google services.how to write a chrome extension

    ReplyDelete
  2. I couldn't locate the ideal words to value this. This is genuinely one in a millions.
    linkedin

    ReplyDelete