Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Saturday, January 9, 2016

Using Connector/J with Python

With Python you would normally use MySQL Connector/Python or the older MySQLdb to connect from Python to MySQL, but there are more options.

There are also multiple Python implementations: CPython (the main implementation), PyPy, Jython and IronPython. PyPy tries to be faster than CPython by using a Just-in-Time compiler. Jython runs on the JVM and IronPython runs on the .NET CLR.

Connector/Python by default (Without the C Extension) is a pure Python implementation and can work with most if not all implementations. And for MySQLdb there is a drop-in replacement called PyMySQL, which is a pure python implementation.

So there are many options already. But for at least Jython it is also possible to use a Java (JDBC) driver.

But why would you use a different Python implementation? There are multiple reasons for that:

  • Speed. PyPy can be faster and Jython has no Global Interpreter Lock (GIL) , which can allow for more concurrent execution.
  • To access 'native' code. e.g. call Java code from Jython or C# from IronPython.
  • Use existing infrastructure. You can deploy a Jython application on Tomcat.
  • Create testcases, healthchecks etc. which uses the same settings and infrastucture as your Java application with the benefits of a scripting language.

I wanted to test how Connector/J behaves with regards to TLS (the successor of SSL).

Setup

The first step is to get Jython, and Connector/J on your system. On Fedora 23 this is easily done with a dnf install jython mysql-connector-java.

Then I used MySQL Sandbox to setup a MySQL 5.7.10 sandbox. To enable TLS I did a ./my sql_ssl_rsa_setup, which is the Sandbox version of mysql_ssl_rsa_setup. If you have a pre-5.7 version then you can use mysslgen instead.

To convert the CA certificate from the PEM format to the Java Key Store (JKS) format I used keytool.

$ keytool -importcert -trustcacerts -file ca.pem -keystore /tmp/mysql57keystore.jks
Enter keystore password:  
Re-enter new password: 
Owner: CN=MySQL_Server_5.7.10_Auto_Generated_CA_Certificate
Issuer: CN=MySQL_Server_5.7.10_Auto_Generated_CA_Certificate
Serial number: 1
Valid from: Fri Jan 08 16:23:16 CET 2016 until: Mon Jan 05 16:23:16 CET 2026
Certificate fingerprints:
     MD5:  B5:B5:2B:53:5C:91:A2:6A:64:B5:C9:12:85:A0:CE:CC
     SHA1: 85:F1:AB:14:15:33:65:A8:71:4D:00:A6:C6:FC:8F:7F:BE:95:BA:B0
     SHA256: CB:B9:D5:BC:26:76:37:3A:66:67:99:95:5B:3B:8E:95:84:6C:A4:5F:52:39:EF:2A:23:36:6E:AB:B0:3E:81:E0
     Signature algorithm name: SHA256withRSA
     Version: 1
Trust this certificate? [no]:  yes
Certificate was added to keystore

Then I had to set my CLASSPATH in order for Jython to find Connector/J.

$ export CLASSPATH=/usr/share/java/mysql-connector-java.jar

Running the test

I used this to test the database connetion:

#!/usr/bin/jython
from __future__ import with_statement
from com.ziclix.python.sql import zxJDBC
from java.lang import System

System.setProperty("javax.net.ssl.trustStore","/tmp/mysql57keystore.jks");
System.setProperty("javax.net.ssl.trustStorePassword","msandbox");

jdbc_url = 'jdbc:mysql://127.0.0.1:18785/test?useSSL=true'
with zxJDBC.connect(jdbc_url, 'msandbox', 'msandbox', 'com.mysql.jdbc.Driver') as c:
    with c.cursor() as cur:
        cur.execute('SHOW SESSION STATUS LIKE \'Ssl_%\'')
        for result in cur:
            print('%-40s: %s' % result)
    raw_input('Press any key to continue...')

This resulted in a working connection. From the database side it looks like this:

mysql> SELECT ATTR_NAME, ATTR_VALUE FROM
    -> performance_schema.session_connect_attrs WHERE PROCESSLIST_ID=40;
+------------------+----------------------+
| ATTR_NAME        | ATTR_VALUE           |
+------------------+----------------------+
| _runtime_version | 1.8.0_65             |
| _client_version  | 5.1.36-SNAPSHOT      |
| _client_name     | MySQL Connector Java |
| _client_license  | GPL                  |
| _runtime_vendor  | Oracle Corporation   |
+------------------+----------------------+
5 rows in set (0.00 sec)

mysql> SELECT * FROM performance_schema.status_by_thread WHERE 
    -> THREAD_ID=(SELECT THREAD_ID FROM performance_schema.threads
    -> WHERE PROCESSLIST_ID=40) and VARIABLE_NAME LIKE 'Ssl_version';
+-----------+---------------+----------------+
| THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE |
+-----------+---------------+----------------+
|        65 | Ssl_version   | TLSv1          |
+-----------+---------------+----------------+
1 row in set (0.00 sec)

I wanted to see if upgrading Connector/J would change anythong. So I downloaded the latest release and change my CLASSPATH to only include that.

mysql> SELECT ATTR_NAME, ATTR_VALUE FROM
    -> performance_schema.session_connect_attrs WHERE PROCESSLIST_ID=45;
+------------------+----------------------+
| ATTR_NAME        | ATTR_VALUE           |
+------------------+----------------------+
| _runtime_version | 1.8.0_65             |
| _client_version  | 5.1.38               |
| _client_name     | MySQL Connector Java |
| _client_license  | GPL                  |
| _runtime_vendor  | Oracle Corporation   |
+------------------+----------------------+
5 rows in set (0.00 sec)

mysql> SELECT * FROM performance_schema.status_by_thread WHERE
    -> THREAD_ID=(SELECT THREAD_ID FROM performance_schema.threads
    -> WHERE PROCESSLIST_ID=45) and VARIABLE_NAME LIKE 'Ssl_version';
+-----------+---------------+----------------+
| THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE |
+-----------+---------------+----------------+
|        70 | Ssl_version   | TLSv1.1        |
+-----------+---------------+----------------+
1 row in set (0.00 sec)

And it did. Connector/J 5.1.38 uses TLSv1.1 instead of TLSv1.0

Sunday, January 3, 2016

The performance of TLS with MySQL Connector/Python

I've ran a simple test to see the performance impact of TLS on MySQL connections with MySQL Connector/Python

The test results are in this Jupyter notebook.

TL;DR:
  • Try to reuse connections if you use TLS
  • Establishing TLS connections is expensive (server & client)
  • Improved performance might be possible in the future by using TLS Tickets
Not tested:
  • Difference between YaSSL and OpenSSL
  • Difference between Ciphersuites
  • Performance of larger resultsets and queries

Saturday, December 26, 2015

The performance of MySQL Connector/Python with C Extension

The source of this post is in this gist on nbviewer.

After reading about the difference between MySQL Connector/Python and MySQLdb on this blog post I wondered how the C Extension option in Connector/Python would perform.

If you want to run the code yourself you'll need: Jupyter/IPython, Python 3, Requests, MySQLdb, Connector/Python, Matplotlib, Pandas and MySQL.

In [1]:
%matplotlib notebook
In [2]:
import random
import gzip
import time

import pandas as pd
import matplotlib.pyplot as plt
import requests
import mysql.connector
import MySQLdb
for imp in [mysql.connector, MySQLdb]:
    print('Using {imp} {version}'.format(imp=imp.__name__, version=imp.__version__))
print('C Extension for MySQL Connector/Python available: %s' % mysql.connector.HAVE_CEXT)
Using mysql.connector 2.1.3
Using MySQLdb 1.3.7
C Extension for MySQL Connector/Python available: True

Make sure the C Extension is available. This needs MySQL Connector/Python 2.1 or newer. On Fedora you might need to install this with dnf install mysql-connector-python3-cext if you have the mysql-connectors-community repository installed. If you compile from source then make sure to use the --with-mysql-capi option.

In [3]:
worlddb_url = 'https://downloads.mysql.com/docs/world.sql.gz'
worlddb_req = requests.get(worlddb_url)
if worlddb_req.status_code == 200:
    worldsql = gzip.decompress(worlddb_req.content).decode('iso-8859-15')
In [4]:
config = {
    'host': '127.0.0.1',
    'port': 5710,
    'user': 'msandbox',
    'passwd': 'msandbox',
}

The above is my config to connect to a MySQL Sandbox running MySQL Server 5.7.10.

Note: you might hit MySQL Bug #79780 when loading the world database into MySQL with Connector/Python with the C Extension enabled.

In [5]:
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
cur1.execute('DROP SCHEMA IF EXISTS world')
cur1.execute('CREATE SCHEMA world DEFAULT CHARACTER SET latin1')
cur1.execute('USE world')
result = [x for x in cur1.execute(worldsql, multi=True)]
cur1.close()
c1.close()
In [6]:
config['db'] = 'world'
In [7]:
perfdata = pd.DataFrame(columns=['connpy','connpy_cext','MySQLdb'], index=range(10000))

Now we're going to run 10000 queries with a random primary key between 1 and 8000. This does not use the C Extension as use_pure is set to True.

In [8]:
c1 = mysql.connector.connect(use_pure=True, **config)
cur1 = c1.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur1.fetchone()
    perfdata.ix[it]['connpy'] = time.perf_counter() - start

Next up is Connector/Python with the C Extension (use_pure=False and HAVE_CEXT indicates we have the C Extension available)

In [9]:
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur1.fetchone()
    perfdata.ix[it]['connpy_cext'] = time.perf_counter() - start

And last, but not least, MySQLdb.

In [10]:
c2 = MySQLdb.connect(**config)
cur2 = c2.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur2.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur2.fetchone()
    perfdata.ix[it]['MySQLdb'] = time.perf_counter() - start

Now let's have a look to what our data looks like

In [11]:
perfdata.head()
Out[11]:
connpy connpy_cext MySQLdb
0 0.00145918 0.000354935 0.000353173
1 0.000907707 0.000243508 0.000249597
2 0.000468397 0.000277101 0.000207893
3 0.000595066 0.000241349 0.00020754
4 0.000641848 0.000258027 0.000193182

Now let's plot that

In [12]:
plt.style.use('ggplot')
plt.scatter(perfdata.index, perfdata.connpy, s=1, c='r',
            label='Connector/Python Pure')
plt.scatter(perfdata.index, perfdata.connpy_cext, s=1, c='g',
            label='Connector/Python C Ext')
plt.scatter(perfdata.index, perfdata.MySQLdb, s=1, c='b',
            label='MySQLdb')
plt.ylim(ymin=0, ymax=0.001)
plt.xlim(xmin=0, xmax=10000)
plt.xlabel('Run #')
plt.ylabel('Runtime in seconds')
plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0x7f47bd8c2518>

The performance of MySQL Connector/Python 2.1 with the C Extension is much closer to MySQLdb.

There is one serious drawback of using the C Extension: Prepared statements are not yet supported.

But there is more that performance alone. I like the MySQL Connector/Python API over the MySQLdb API and this shows that Connector/Python is flexible and can be almost as fast.

The pure Python implementation also has advantages: Easier installation (no C compiler required) and the option to use alternative implementations like PyPy