The source of this post is in this gist on nbviewer.
After reading about the difference between MySQL Connector/Python and MySQLdb on this blog post I wondered how the C Extension option in Connector/Python would perform.
If you want to run the code yourself you'll need: Jupyter/IPython, Python 3, Requests, MySQLdb, Connector/Python, Matplotlib, Pandas and MySQL.
In [1]:
%matplotlib notebook
In [2]:
importrandomimportgzipimporttimeimportpandasaspdimportmatplotlib.pyplotaspltimportrequestsimportmysql.connectorimportMySQLdbforimpin[mysql.connector,MySQLdb]:print('Using {imp} {version}'.format(imp=imp.__name__,version=imp.__version__))print('C Extension for MySQL Connector/Python available: %s'%mysql.connector.HAVE_CEXT)
Using mysql.connector 2.1.3
Using MySQLdb 1.3.7
C Extension for MySQL Connector/Python available: True
Make sure the C Extension is available. This needs MySQL Connector/Python 2.1 or newer. On Fedora you might need to install this with dnf install mysql-connector-python3-cext if you have the mysql-connectors-community repository installed. If you compile from source then make sure to use the --with-mysql-capi option.
The above is my config to connect to a MySQL Sandbox running MySQL Server 5.7.10.
Note: you might hit MySQL Bug #79780 when loading
the world database into MySQL with Connector/Python with the C Extension enabled.
In [5]:
c1=mysql.connector.connect(use_pure=False,**config)cur1=c1.cursor()cur1.execute('DROP SCHEMA IF EXISTS world')cur1.execute('CREATE SCHEMA world DEFAULT CHARACTER SET latin1')cur1.execute('USE world')result=[xforxincur1.execute(worldsql,multi=True)]cur1.close()c1.close()
Now we're going to run 10000 queries with a random primary key between 1 and 8000. This does not use the C Extension as use_pure is set to True.
In [8]:
c1=mysql.connector.connect(use_pure=True,**config)cur1=c1.cursor()foritinrange(10000):city_id=random.randint(1,8000)start=time.perf_counter()cur1.execute("SELECT * FROM City WHERE ID=%s",(city_id,))cur1.fetchone()perfdata.ix[it]['connpy']=time.perf_counter()-start
Next up is Connector/Python with the C Extension (use_pure=False and HAVE_CEXT indicates we have the C Extension available)
In [9]:
c1=mysql.connector.connect(use_pure=False,**config)cur1=c1.cursor()foritinrange(10000):city_id=random.randint(1,8000)start=time.perf_counter()cur1.execute("SELECT * FROM City WHERE ID=%s",(city_id,))cur1.fetchone()perfdata.ix[it]['connpy_cext']=time.perf_counter()-start
And last, but not least, MySQLdb.
In [10]:
c2=MySQLdb.connect(**config)cur2=c2.cursor()foritinrange(10000):city_id=random.randint(1,8000)start=time.perf_counter()cur2.execute("SELECT * FROM City WHERE ID=%s",(city_id,))cur2.fetchone()perfdata.ix[it]['MySQLdb']=time.perf_counter()-start
Now let's have a look to what our data looks like
In [11]:
perfdata.head()
Out[11]:
connpy
connpy_cext
MySQLdb
0
0.00145918
0.000354935
0.000353173
1
0.000907707
0.000243508
0.000249597
2
0.000468397
0.000277101
0.000207893
3
0.000595066
0.000241349
0.00020754
4
0.000641848
0.000258027
0.000193182
Now let's plot that
In [12]:
plt.style.use('ggplot')plt.scatter(perfdata.index,perfdata.connpy,s=1,c='r',label='Connector/Python Pure')plt.scatter(perfdata.index,perfdata.connpy_cext,s=1,c='g',label='Connector/Python C Ext')plt.scatter(perfdata.index,perfdata.MySQLdb,s=1,c='b',label='MySQLdb')plt.ylim(ymin=0,ymax=0.001)plt.xlim(xmin=0,xmax=10000)plt.xlabel('Run #')plt.ylabel('Runtime in seconds')plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0x7f47bd8c2518>
The performance of MySQL Connector/Python 2.1 with the C Extension is much closer to MySQLdb.
There is one serious drawback of using the C Extension: Prepared statements are not yet supported.
But there is more that performance alone. I like the MySQL Connector/Python API over the MySQLdb API and this shows that Connector/Python is flexible and can be almost as fast.
The pure Python implementation also has advantages: Easier installation (no C compiler required) and the option to use alternative implementations like PyPy