Tuesday, January 26, 2016

When simple SQL can be complex

I think SQL is a very simple language, but ofcourse I'm biased.

But even a simple statement might have more complexity to it than you might think.

Do you know what the result is of this statement?
SELECT FALSE = FALSE = TRUE;
scroll down for the answer.



























The answer is: it depends.

You might expect it to return false because the 3 items in the comparison are not equal. But that's not the case.

In PostgreSQL this is the result:
postgres=# SELECT FALSE = FALSE = TRUE;
 ?column? 
----------
 t
(1 row)
So it compares FALSE against FALSE, which results in TRUE and then That is compared against TRUE, which results in TRUE. PostgreSQL has proper boolean literals.

Next up is MySQL:
mysql> SELECT FALSE = FALSE = TRUE;
+----------------------+
| FALSE = FALSE = TRUE |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)
This is similar but it's slightly different. The result is 1 because in MySQL TRUE and FALSE evalueate to 0 and 1. If you use BOOLEAN in your DDL this will be changed to tinyint(1). But note that the (1) is only the display width and doesn't change the storage space (tinyint is 1 byte).

And SQLite has yet another result:
sqlite> SELECT FALSE = FALSE = TRUE;
Error: no such column: FALSE
This is because SQLite doesn't have a boolean type and you're expected to use 0 and 1.
If we use the suggested solution we get the same result as with MySQL.
sqlite> SELECT 0 = 0 = 1;
1

What about the SQL standard?

There is a boolean literal in the SQL:1999 standard according to this Wikipedia article. Note that 1999 is 17 years ago. It is an optional feature so it isn't required. Note that a boolean can have 3 values according to the standard. It can be TRUE, FALSE or UNKNOWN. It suggests that the UNKNOWN literal may evaluate to NULL. Neither MySQL, PostgreSQL or SQLite implements the UNKNOWN literal.

What about commercial databases?

DB2, Oracle and SQL Server don't have a boolean type according to this webpage. For DB2 this has changed, according to this page from IBM BOOLEAN support was added in DB2 9.7.0. It supports TRUE, FALSE and NULL, but not UNKNOWN for what I can see.
Ingres 10.0 has full standards complient support for BOOLEAN according to their wiki.

Interestingly enough there are multiple suggestions about what to use when there is no boolean type: BIT, CHAR(1), NUMBER(1). This blogpost from Peter Zaitsev also lists another option: CHAR(0).

So even something simple as a boolean might be less portable than you might have thought it was.

But what about doing a real three-way compare in SQL?

One solution would be to use the & operator:
postgres=# SELECT FALSE::int & FALSE::int & TRUE::int;
 ?column? 
----------
        0
(1 row)
 
mysql [(none)] > SELECT FALSE & FALSE & TRUE;
+----------------------+
| FALSE & FALSE & TRUE |
+----------------------+
|                    0 |
+----------------------+
1 row in set (0.00 sec)
 
sqlite> SELECT 0 & 0 & 1;
0

Saturday, January 9, 2016

Using Connector/J with Python

With Python you would normally use MySQL Connector/Python or the older MySQLdb to connect from Python to MySQL, but there are more options.

There are also multiple Python implementations: CPython (the main implementation), PyPy, Jython and IronPython. PyPy tries to be faster than CPython by using a Just-in-Time compiler. Jython runs on the JVM and IronPython runs on the .NET CLR.

Connector/Python by default (Without the C Extension) is a pure Python implementation and can work with most if not all implementations. And for MySQLdb there is a drop-in replacement called PyMySQL, which is a pure python implementation.

So there are many options already. But for at least Jython it is also possible to use a Java (JDBC) driver.

But why would you use a different Python implementation? There are multiple reasons for that:

  • Speed. PyPy can be faster and Jython has no Global Interpreter Lock (GIL) , which can allow for more concurrent execution.
  • To access 'native' code. e.g. call Java code from Jython or C# from IronPython.
  • Use existing infrastructure. You can deploy a Jython application on Tomcat.
  • Create testcases, healthchecks etc. which uses the same settings and infrastucture as your Java application with the benefits of a scripting language.

I wanted to test how Connector/J behaves with regards to TLS (the successor of SSL).

Setup

The first step is to get Jython, and Connector/J on your system. On Fedora 23 this is easily done with a dnf install jython mysql-connector-java.

Then I used MySQL Sandbox to setup a MySQL 5.7.10 sandbox. To enable TLS I did a ./my sql_ssl_rsa_setup, which is the Sandbox version of mysql_ssl_rsa_setup. If you have a pre-5.7 version then you can use mysslgen instead.

To convert the CA certificate from the PEM format to the Java Key Store (JKS) format I used keytool.

$ keytool -importcert -trustcacerts -file ca.pem -keystore /tmp/mysql57keystore.jks
Enter keystore password:  
Re-enter new password: 
Owner: CN=MySQL_Server_5.7.10_Auto_Generated_CA_Certificate
Issuer: CN=MySQL_Server_5.7.10_Auto_Generated_CA_Certificate
Serial number: 1
Valid from: Fri Jan 08 16:23:16 CET 2016 until: Mon Jan 05 16:23:16 CET 2026
Certificate fingerprints:
     MD5:  B5:B5:2B:53:5C:91:A2:6A:64:B5:C9:12:85:A0:CE:CC
     SHA1: 85:F1:AB:14:15:33:65:A8:71:4D:00:A6:C6:FC:8F:7F:BE:95:BA:B0
     SHA256: CB:B9:D5:BC:26:76:37:3A:66:67:99:95:5B:3B:8E:95:84:6C:A4:5F:52:39:EF:2A:23:36:6E:AB:B0:3E:81:E0
     Signature algorithm name: SHA256withRSA
     Version: 1
Trust this certificate? [no]:  yes
Certificate was added to keystore

Then I had to set my CLASSPATH in order for Jython to find Connector/J.

$ export CLASSPATH=/usr/share/java/mysql-connector-java.jar

Running the test

I used this to test the database connetion:

#!/usr/bin/jython
from __future__ import with_statement
from com.ziclix.python.sql import zxJDBC
from java.lang import System

System.setProperty("javax.net.ssl.trustStore","/tmp/mysql57keystore.jks");
System.setProperty("javax.net.ssl.trustStorePassword","msandbox");

jdbc_url = 'jdbc:mysql://127.0.0.1:18785/test?useSSL=true'
with zxJDBC.connect(jdbc_url, 'msandbox', 'msandbox', 'com.mysql.jdbc.Driver') as c:
    with c.cursor() as cur:
        cur.execute('SHOW SESSION STATUS LIKE \'Ssl_%\'')
        for result in cur:
            print('%-40s: %s' % result)
    raw_input('Press any key to continue...')

This resulted in a working connection. From the database side it looks like this:

mysql> SELECT ATTR_NAME, ATTR_VALUE FROM
    -> performance_schema.session_connect_attrs WHERE PROCESSLIST_ID=40;
+------------------+----------------------+
| ATTR_NAME        | ATTR_VALUE           |
+------------------+----------------------+
| _runtime_version | 1.8.0_65             |
| _client_version  | 5.1.36-SNAPSHOT      |
| _client_name     | MySQL Connector Java |
| _client_license  | GPL                  |
| _runtime_vendor  | Oracle Corporation   |
+------------------+----------------------+
5 rows in set (0.00 sec)

mysql> SELECT * FROM performance_schema.status_by_thread WHERE 
    -> THREAD_ID=(SELECT THREAD_ID FROM performance_schema.threads
    -> WHERE PROCESSLIST_ID=40) and VARIABLE_NAME LIKE 'Ssl_version';
+-----------+---------------+----------------+
| THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE |
+-----------+---------------+----------------+
|        65 | Ssl_version   | TLSv1          |
+-----------+---------------+----------------+
1 row in set (0.00 sec)

I wanted to see if upgrading Connector/J would change anythong. So I downloaded the latest release and change my CLASSPATH to only include that.

mysql> SELECT ATTR_NAME, ATTR_VALUE FROM
    -> performance_schema.session_connect_attrs WHERE PROCESSLIST_ID=45;
+------------------+----------------------+
| ATTR_NAME        | ATTR_VALUE           |
+------------------+----------------------+
| _runtime_version | 1.8.0_65             |
| _client_version  | 5.1.38               |
| _client_name     | MySQL Connector Java |
| _client_license  | GPL                  |
| _runtime_vendor  | Oracle Corporation   |
+------------------+----------------------+
5 rows in set (0.00 sec)

mysql> SELECT * FROM performance_schema.status_by_thread WHERE
    -> THREAD_ID=(SELECT THREAD_ID FROM performance_schema.threads
    -> WHERE PROCESSLIST_ID=45) and VARIABLE_NAME LIKE 'Ssl_version';
+-----------+---------------+----------------+
| THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE |
+-----------+---------------+----------------+
|        70 | Ssl_version   | TLSv1.1        |
+-----------+---------------+----------------+
1 row in set (0.00 sec)

And it did. Connector/J 5.1.38 uses TLSv1.1 instead of TLSv1.0

Sunday, January 3, 2016

The performance of TLS with MySQL Connector/Python

I've ran a simple test to see the performance impact of TLS on MySQL connections with MySQL Connector/Python

The test results are in this Jupyter notebook.

TL;DR:
  • Try to reuse connections if you use TLS
  • Establishing TLS connections is expensive (server & client)
  • Improved performance might be possible in the future by using TLS Tickets
Not tested:
  • Difference between YaSSL and OpenSSL
  • Difference between Ciphersuites
  • Performance of larger resultsets and queries

Saturday, December 26, 2015

The performance of MySQL Connector/Python with C Extension

The source of this post is in this gist on nbviewer.

After reading about the difference between MySQL Connector/Python and MySQLdb on this blog post I wondered how the C Extension option in Connector/Python would perform.

If you want to run the code yourself you'll need: Jupyter/IPython, Python 3, Requests, MySQLdb, Connector/Python, Matplotlib, Pandas and MySQL.

In [1]:
%matplotlib notebook
In [2]:
import random
import gzip
import time

import pandas as pd
import matplotlib.pyplot as plt
import requests
import mysql.connector
import MySQLdb
for imp in [mysql.connector, MySQLdb]:
    print('Using {imp} {version}'.format(imp=imp.__name__, version=imp.__version__))
print('C Extension for MySQL Connector/Python available: %s' % mysql.connector.HAVE_CEXT)
Using mysql.connector 2.1.3
Using MySQLdb 1.3.7
C Extension for MySQL Connector/Python available: True

Make sure the C Extension is available. This needs MySQL Connector/Python 2.1 or newer. On Fedora you might need to install this with dnf install mysql-connector-python3-cext if you have the mysql-connectors-community repository installed. If you compile from source then make sure to use the --with-mysql-capi option.

In [3]:
worlddb_url = 'https://downloads.mysql.com/docs/world.sql.gz'
worlddb_req = requests.get(worlddb_url)
if worlddb_req.status_code == 200:
    worldsql = gzip.decompress(worlddb_req.content).decode('iso-8859-15')
In [4]:
config = {
    'host': '127.0.0.1',
    'port': 5710,
    'user': 'msandbox',
    'passwd': 'msandbox',
}

The above is my config to connect to a MySQL Sandbox running MySQL Server 5.7.10.

Note: you might hit MySQL Bug #79780 when loading the world database into MySQL with Connector/Python with the C Extension enabled.

In [5]:
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
cur1.execute('DROP SCHEMA IF EXISTS world')
cur1.execute('CREATE SCHEMA world DEFAULT CHARACTER SET latin1')
cur1.execute('USE world')
result = [x for x in cur1.execute(worldsql, multi=True)]
cur1.close()
c1.close()
In [6]:
config['db'] = 'world'
In [7]:
perfdata = pd.DataFrame(columns=['connpy','connpy_cext','MySQLdb'], index=range(10000))

Now we're going to run 10000 queries with a random primary key between 1 and 8000. This does not use the C Extension as use_pure is set to True.

In [8]:
c1 = mysql.connector.connect(use_pure=True, **config)
cur1 = c1.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur1.fetchone()
    perfdata.ix[it]['connpy'] = time.perf_counter() - start

Next up is Connector/Python with the C Extension (use_pure=False and HAVE_CEXT indicates we have the C Extension available)

In [9]:
c1 = mysql.connector.connect(use_pure=False, **config)
cur1 = c1.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur1.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur1.fetchone()
    perfdata.ix[it]['connpy_cext'] = time.perf_counter() - start

And last, but not least, MySQLdb.

In [10]:
c2 = MySQLdb.connect(**config)
cur2 = c2.cursor()
for it in range(10000):
    city_id = random.randint(1,8000)
    start = time.perf_counter()
    cur2.execute("SELECT * FROM City WHERE ID=%s", (city_id,))
    cur2.fetchone()
    perfdata.ix[it]['MySQLdb'] = time.perf_counter() - start

Now let's have a look to what our data looks like

In [11]:
perfdata.head()
Out[11]:
connpy connpy_cext MySQLdb
0 0.00145918 0.000354935 0.000353173
1 0.000907707 0.000243508 0.000249597
2 0.000468397 0.000277101 0.000207893
3 0.000595066 0.000241349 0.00020754
4 0.000641848 0.000258027 0.000193182

Now let's plot that

In [12]:
plt.style.use('ggplot')
plt.scatter(perfdata.index, perfdata.connpy, s=1, c='r',
            label='Connector/Python Pure')
plt.scatter(perfdata.index, perfdata.connpy_cext, s=1, c='g',
            label='Connector/Python C Ext')
plt.scatter(perfdata.index, perfdata.MySQLdb, s=1, c='b',
            label='MySQLdb')
plt.ylim(ymin=0, ymax=0.001)
plt.xlim(xmin=0, xmax=10000)
plt.xlabel('Run #')
plt.ylabel('Runtime in seconds')
plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0x7f47bd8c2518>

The performance of MySQL Connector/Python 2.1 with the C Extension is much closer to MySQLdb.

There is one serious drawback of using the C Extension: Prepared statements are not yet supported.

But there is more that performance alone. I like the MySQL Connector/Python API over the MySQLdb API and this shows that Connector/Python is flexible and can be almost as fast.

The pure Python implementation also has advantages: Easier installation (no C compiler required) and the option to use alternative implementations like PyPy