Sunday, August 17, 2014

The new cloud backup option of MySQL Enterprise Backup

MySQL Enterprise Backup 3.10 support backups to the cloud. The only supported cloud service is Amazon S3.

When the cloud destination is used mysqlbackup will upload the backup as an image file.

You can specify all options on the commandline:
mysqlbackup --cloud-service=s3 --cloud-aws-region=eu-west-1 \
--cloud-access-key-id=AKIAJLGCPXEGVHCQD27B \
--cloud-secret-access-key=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 \
--cloud-bucket=meb_myserver --cloud-object-key=firstbackup --cloud-trace=0 \
--backup-dir=/tmp/firstbackup --backup-image=- --with-timestamp backup-to-image

But you can also put the settings in the my.cnf
[mysqlbackup_cloud]
cloud-service=s3
cloud-aws-region=eu-west-1
cloud-access-key-id=AKIAJLGCPXEGVHCQD27B
cloud-secret-access-key=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
cloud-bucket=meb_myserver
cloud-trace=0
backup-dir=/data/cloudbackup
backup-image=-
with-timestamp

The with-timestamp option is important as the backup won't start if the backup-dir already exists. This is because mysqlbackup will leave the backup directory exists after uploading the backup. The backup directory will only have meta info and the log file, not the actual backup.

By using a group suffix like _cloud you can put settings for multiple types of backups in one cnf file.

mysqlbackup --defaults-group-suffix='_cloud' \
--cloud-object-key=backup_2014081701 backup-to-image

The account you're using should have this policy to be allowed to read and write to the s3 bucket:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1408302840000",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket", 
        "s3:ListBucketMultipartUploads",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::meb_myserver/*"
      ]
    }
  ]
}

This looks like a good option to me if you're already using mysqlbackup and amazon. It would be nice if the next version would support other cloud providers (e.g. openstack swift, ceph). Implementing this should be easy for those with an s3 compatibility layer, but will probably take more time for others.

I did find some bugs (just search for tag=cloud on http://bugs.mysql.com if you're interested).

Thursday, August 7, 2014

MySQL User Group Meetup in Amsterdam

This Tuesday Markus Winand will talk at the MySQL User Group NL meetup about "Indexes: The neglected performance all-rounder".
Markus is known for the http://use-the-index-luke.com website and the SQL Performance Explained book.

Date: Tuesday August 12
Location: Marktplaats/eBay Office Amsterdam
Registration: http://www.meetup.com/MySQL-User-Group-NL/events/196440532/

Sunday, July 20, 2014

Decoding (encrypted) MySQL traffic with Wireshark

In a comment on my post about Using SSL with MySQL xiaochong zhang asked if it is possible to decode SSL/TLS encrypted MySQL traffic. The short answer is: It depends.

To test this we need a MySQL server which is SSL enabled. I used MySQL Sandbox to create a sandboxed 5.6.19 server. Then I used mysslgen to create the config and the certificates.

$ make_sandbox 5.6.19
$ ./mysslgen.py --config=sandboxes/msb_5_6_19/my.sandbox.cnf --ssldir=sandboxes/msb_5_6_19/ssl

This assumes there already is a extracted tarball of MySQL 5.6.19 in ~/mysql/5.6.19

The mysslgen.py script will return a message with the changes you should make in your mysqld and client sections of the my.sandbox.cnf file. Then restart the server to make it active.

For SSL to work we need to connect using TCP/IP instead of over a UNIX socket. So we connect with "./my sql -h 127.0.0.1". Now execute "\s" or "status" to see if we're indeed using SSL.

It probably looks like this:

mysql [127.0.0.1] {msandbox} ((none)) > \s
--------------
/home/dveeden/opt/mysql/5.6.19/bin/mysql  Ver 14.14 Distrib 5.6.19, for linux-glibc2.5 (x86_64) using  EditLine wrapper

Connection id:  3
Current database: 
Current user:  msandbox@localhost
SSL:   Cipher in use is DHE-RSA-AES256-SHA
Current pager:  stdout
Using outfile:  ''
Using delimiter: ;
Server version:  5.6.19 MySQL Community Server (GPL)
Protocol version: 10
Connection:  127.0.0.1 via TCP/IP
Server characterset: latin1
Db     characterset: latin1
Client characterset: utf8
Conn.  characterset: utf8
TCP port:  5619
Uptime:   1 hour 32 min 48 sec

Threads: 1  Questions: 18  Slow queries: 0  Opens: 67  Flush tables: 1  Open tables: 60  Queries per second avg: 0.003
--------------

Now disconnect and start the trace.

sudo tcpdump -i lo -s 65535 port 5619 -w /tmp/mysql.pcap

First connect w/o SSL: "./my sql -h 127.0.0.1 --skip-ssl". And then with SSL: "./my sql -h 127.0.0.1"
Stop the tcpdump session and start wireshark and open the mysql.pcap file. Now we can inspect the protocol. If MySQL is using the default port (3306) then wireshark will automatically decode the traffic, but now we have to use 'Decode as...' to tell wireshark this is MySQL traffic. The server greeting packet and the login request should now be visible. In the login request there are client capability flags, one of the flags indicates 'Switch to SSL after handshake' and should be set for the SSL session.



Both SSL and non-SSL sessions will use the same port and start an unencrypted session. The encrypted session will switch to SSL after the handshake. This is a bit like STARTTLS for IMAP. The current version of the MySQL protocol dissector is not (yet) aware of some of the new information in the initial handshake. So the information for the authentication plugins and connection attributes is not decoded yet. The documentation about the protocol can be found in the MySQL Internals manual.

So that's the plaintext part. Now we get to the SSL part. In my setup the default cipher suite which is used for SSL is  DHE-RSA-AES256-SHA. With OpenSSL's ciphers command we can get some more details:

$ openssl ciphers -v 'DHE-RSA-AES256-SHA'
DHE-RSA-AES256-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(256)  Mac=SHA1

This means that SHA 1 is use for the MAC part and that AES-256 is used for encryption and the keyexchange is done with DH (Diffie-Hellman). This poses a problem as DH will generate a session key, and we don't have that in the traffic dump as it's not sent over the network. We could use gdb (and maybe a debug trace?) to get the DH keys out, but for now we have an easier solution: use a different cipher suite.

So start tcpdump again and run "./my sql -h 127.0.0.1 --ssl-cipher=AES256-SHA". This cipher uses RSA for keyexchange instead of DH. This means everything we need is send over the network or is present in the SSL certificate and/or key.

Now start wireshark again and use 'Decode as...' and choose SSL. Then go Edit→Preferences→Protocols→SSL→'RSA key list' and add the server's SSL key. Now you should see the decoded traffic.


So decoding SSL/TLS encrypted MySQL traffic is possible. But you need to have:
  • All traffic since the beginning of the TCP/IP connection
  • The server's SSL key
  • The DH session key if DH is used. (you might want to read about Forward secrecy (PFS) if you're interested in the details).

Wednesday, July 16, 2014

Oracle Critical Patch Update for MySQL

Oracle has released the Critical Patch Update (CPU) for July 2014. The Oracle MySQL Risk Matrix lists 10 security fixes.

It took me some time to understand the subcomponent names. So here is the list with the full name of each subcomponent:

SubcomponentFull name
SRFTSServer: Full Text Search
SRINFOSCServer: INFORMATION_SCHEMA
SRCHARServer: Character sets
ENARCEngine: Archive
SROPTZRServer: Optimizer
SRREPServer: Replication
SRSPServer: Stored Procecure
ENFEDEngine: Federated

I don't think there is anything really important in the list, but it might be a good trigger to update to the latest release.

Upgrading should be easy especially if you're using the APT or YUM repositories from Oracle. If you're upgrading to a new major release (e.g. from 5.5 to 5.6) then you should read the instructions in the reference manual.

Sunday, July 6, 2014

The MySQL 6.0 goodybag

After MySQL 5.1 was released work started on MySQL 5.2, but then this was renamed to MySQL 6.0. There were many new features introduced in 6.0. But then stabilizing this branch became as huge task. Eventually the decision was made to start of with a stable branch and backport the new features from 6.0. This is how many of the 6.0 features landed in 5.5 and 5.6.

So let's see which features 6.0 brought and were they landed. I'll use the What Is New in MySQL 6.0 section of the MySQL 6.0 Reference Manual for this.

  • The Falcon storage engine. This never landed anywhere as far as I know. It's not even included in the list of storage engines in the MariaDB knowledgbase. As both InnoDB and MySQL are now part of Oracle I don't see any reason for Falcon to exist anymore.
  • 4-byte utf8 and support for utf16 and utf32. This is included in MySQL 5.5 together with many other Unicode enhancements.
  • Database backup with SQL. This allows you to make backups by executing 'BACKUP DATABASE' SQL statements. This is has not landed anywhere as far as I know, but some of the code might have made it into MySQL Enterprise Backup (both use  backup_history and backup_progress tables in the mysql database). This might be an interesting thing to have, but with MEB there is not a real need for it.
  • Subquery enhancements, BKA and MRR. This all made it into MySQL 5.6.
  • LOAD XML. This made it into MySQL 5.5, but I don't think it is used often.
  • The Maria storage engine. This is a transactional MyISAM storage egine. This is not used in Oracle MySQL. It is included in MariaDB, but renamed to Aria as it might otherwise cause confusion. As far as I known there is not much development being done on Aria.
  • Foreign Keys. Of course MySQL supports foreign keys (now even with NDB!), but those are implemented in the storage engine, not in the server. This was on the roadmap for 6.x but I don't know if it was actually implemented. Implementing this in the server could make writing storage engines easier and would probably also make it easier to combine partitioning and foreign keys.

Did I forget any 6.x features? Did you ever try 6.x? Let me know in the comments!



Wednesday, July 2, 2014

Single database backup and restore with MEB

I was recently asked about if MySQL Enterprise Backup would be able to restore single databases.

My initial answer was that this was complicated, but might be doable with the Transportable Table Space (TTS) option.

But first let's go back to the basics. A common way of working with mysqldump is to get a list of databases and then loop through the databases and dump the data and schema to a SQL file. But both backups and restores will take a lot of time if the size of the database grows. And it's a luke-warm backup at best instead of a hot backup. So that's why we have MySQL Enterprise Backup.

MySQL Enterprise Backup allows you to make a hot backup of InnoDB tables by copying the datafiles while watching the InnoDB redo log files.

On disk the data from the InnoDB storage engine consists of a system tablespace (one of more ibdataX files), the redo log files (iblogfileX) and zero or more table-specific tablespace files (*.ibd).

The data dictionary data is located in the system tablespace. This is were the tablespace number is stored for each ibd file. The redo logfiles is were the changes are written to before they are written to the datafiles.

This all works fine if you use MEB to backup and restore a whole instance.

Even if you stop MySQL you can't just copy a ibd file to another server and expect it to work. This is because there might be changes for that file still in the redo logfile and the table space ID in the system tablespace might not match.

But you can copy a ibd file to another server if you follow the right procedure. This (partly) works with 5.5, and is greatly enhanced in 5.6.

This is how it works:
a FLUSH TABLES..FOR EXPORT command is issued for certain tables. The tables are then read-only. The changes in the logfile are written to the ibd file and the information from the system tablespace is written to a .cfg file. Then the .ibd and .cfg files can be copied to some other location. Then the table can be unlocked.

You could create a per-database backup with MEB with the --include-tables option. Then if will copy the specified tables' .ibd files and the system tablespace and the redo logs.

This works, but the system tablespace might get big because of many reasons. This is where TTS comes into play. This allows you to make a backup of only the tables w/o copying the system tablespace.

With the --use-tts and --include-table options I can backup all tables for one database and then restore one or more of these tables on another instance. This is without stopping the destination instance.

Some per-database defaults like the default character set are stored in the db.opt file, but this file is not copied. So be aware!

But what if we want to do a point-in-time recovery for just one database? That should work. First restore the database you're concerned about and then use mysqlbinlog with the --database option to restore the binlog entries which are for that specific database.

I haven't use per-database (point-in-time) restores with MEB in production, but it all seems to work fine in a test setup.

I've always used physical backups on a per-instance basis and then mysqldump for single-table and single-database backups. But this might be faster and the data only needs to be backupped once. Also the mysqldumps I took were not prepared for point-in-time restores as the locking required for that would block the server for too long.

Please let me know your experiences in the comments.

Monday, June 23, 2014

On Dolphins, Panda's and Bugs

MySQL Bugs

On Dolphins, Panda's and Bugs

Like any good OpenSource project the MySQL Bugs website is open for anyone to search through. This ofcourse doesn't include the security bugs.

There is a second collection of bugs in the My Oracle Support and these bugs are only accesseble by customers with a support contract. Even when I have access to MOS I still prefer to use the community bugs site. For service requests etc. I would use MOS.

The openness of the bugs database is one of the topic the IOUG MySQL Council discusses with Oracle.

The bugs database has more to offer than just information about initial bugs:

  • Bugs Statistics: This has a big matrix with components and states, some per developer stats and some totals (Did you known the fastest bug closure was 9 seconds?).
  • Bugs Tide This gives you per month statistics.

For both there are some filter option for version and whether to include feature requests.

You might want to read life cycle of a MySQL bug by Valeriy Kravchuk if you're not familiar with the different states of MySQL bugs.

The bugs website offers search results not only in the HTML reports as shown on the website but also offers RSS and CSV format so it can be consumed by computer programs.

To demostrate this I wrote this script. It fetches the CSV for my bugs and generates some graphs. As you might (or might not) notice this doesn't include my older bug reports as I have two different reporter-id's as I had different email accounts for my MySQL account and my My Oracle account before MySQL/Sun was merged with Oracle.

This page is made with IPython Notebook, which is a tool which allows you to combine code, output, graphs and text. You can also view this notebook here.

In [1]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

In [3]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from StringIO import StringIO
from datetime import datetime

This is the URL as found in the bugs website. Feel free to modify the parameters.

In [4]:
bugs_url = "http://bugs.mysql.com/search-csv.php?status=all&severity=all&mine=9242646"
  1. Fetch the data from the URL with the Requests library
  2. Convert the CSV data to a Pandas DataFrame (a DataFrame is a special kind of table)
In [5]:
bugs_req = requests.get(bugs_url)
csv_file = StringIO()
csv_file.write(bugs_req.text)
csv_file.seek(0)
bugsdf = pd.read_csv(csv_file)

Let's have a look at the data.

In [6]:
bugsdf.head()
Out[6]:
ID Entered Modified Type Status Severity Version OS Summary Assign First Assign Last
0 68132 2013-01-21 13:53:28 2013-03-26 16:14:20 Server: Docs Closed S3 5.6, 5.5.29 Any Documentation about verification with GnuPG is... Philip Olson
1 68199 2013-01-28 08:23:59 2013-05-13 13:10:00 bugs.mysql.com Closed S3 5.7 Any Month-by-month (tide) stats for 5.7 absent Sveta Smirnova
2 68200 2013-01-28 08:54:53 2013-03-10 16:03:34 Server: Privileges Closed S3 5.6.9-rc Any Password logging doesn't work as documented NaN NaN
3 68536 2013-03-01 10:29:46 2013-07-31 13:24:26 MySQL Workbench Closed S3 5.2.47 Any Better support IPv6 addresses for new connections NaN NaN
4 68571 2013-03-05 09:08:23 2014-02-26 09:41:24 Monitoring: Advisors/Rules Verified S3 NaN Microsoft Windows (Win7 SP1) False positive for 32-binary check on win64 NaN NaN

Now we have the data, let's make a horizontal barplot for the number of bugs per category (the Type column).

We change the size of the figure as the default is too small to be readable.

In [7]:
fig = plt.figure(figsize=(8,10), dpi=100)
bugsdf.Type.value_counts(ascending=True).plot(kind='barh')
Out[7]:
<matplotlib.axes.AxesSubplot at 0x394de90>

The Version column has a the text for the version. Let's grab the first three characters to get the major version of for the bug. This is not really perfect as it will only return 5.1 if the string is '5.1.30, 5.5.16', but it's good enough for now.

The function will be mapped to the Version column of the dataframe. And we will save the result in a new column called major_version.

In [8]:
def getversion(inputver):
    if isinstance(inputver, float):
        return inputver
    return inputver[:3]
bugsdf['major_version'] = bugsdf.Version.map(getversion)

Besides Pandas we can use matplotlib's pyplot, which is a bit like MATLAB.

Let's create a heatmap for bug status and category

In [9]:
compstat = bugsdf.groupby('major_version').Type.value_counts().unstack().T
fig = plt.figure(figsize=(15,10), dpi=100)
plt.pcolor(compstat, vmin=0, vmax=5, cmap='Blues')
plt.yticks(np.arange(0.5, len(compstat.index), 1), compstat.index)
plt.xticks(np.arange(0.5, len(compstat.columns), 1), compstat.columns)
Out[9]:
([<matplotlib.axis.XTick at 0x3c067d0>,
  <matplotlib.axis.XTick at 0x3d87410>,
  <matplotlib.axis.XTick at 0x3f2d550>,
  <matplotlib.axis.XTick at 0x3f2da50>,
  <matplotlib.axis.XTick at 0x3f2df50>,
  <matplotlib.axis.XTick at 0x3f15490>,
  <matplotlib.axis.XTick at 0x3f15990>,
  <matplotlib.axis.XTick at 0x3f15e90>,
  <matplotlib.axis.XTick at 0x40983d0>,
  <matplotlib.axis.XTick at 0x4098b10>,
  <matplotlib.axis.XTick at 0x3d8f2d0>],
 <a list of 11 Text xticklabel objects>)

Now we can create a heatmap which compares major versions and components.

In [10]:
fig = plt.figure(figsize=(8,10), dpi=100)
plt.pcolor(compstat, vmin=0, vmax=10, cmap='Blues')
plt.yticks(np.arange(0.5, len(compstat.index), 1), compstat.index)
plt.xticks(np.arange(0.5, len(compstat.columns), 1), compstat.columns)
Out[10]:
([<matplotlib.axis.XTick at 0x3f15bd0>,
  <matplotlib.axis.XTick at 0x3f176d0>,
  <matplotlib.axis.XTick at 0x42bcad0>,
  <matplotlib.axis.XTick at 0x42bcfd0>,
  <matplotlib.axis.XTick at 0x4459d50>,
  <matplotlib.axis.XTick at 0x4455710>,
  <matplotlib.axis.XTick at 0x4451310>,
  <matplotlib.axis.XTick at 0x42cdc50>,
  <matplotlib.axis.XTick at 0x42c9810>,
  <matplotlib.axis.XTick at 0x42c51d0>,
  <matplotlib.axis.XTick at 0x42bdb10>],
 <a list of 11 Text xticklabel objects>)

Now we have the major version, let's filter on MySQL 5.6 and then graph the different values for the Status field.

In [11]:
bugsdf[bugsdf.major_version == '5.6'].Status.value_counts().plot(kind='bar')
Out[11]:
<matplotlib.axes.AxesSubplot at 0x4451e10>

Or maybe check the status for all versions.

In [12]:
bugsdf.groupby('major_version').Status.value_counts().unstack().plot(kind='barh', stacked=True)
Out[12]:
<matplotlib.axes.AxesSubplot at 0x446f790>

That's not really helpful, let's remove the Closed ones.

In [13]:
bugsdf[bugsdf.Status != 'Closed'].groupby('major_version').Status.value_counts().unstack().plot(kind='bar', stacked=True)
Out[13]:
<matplotlib.axes.AxesSubplot at 0x4464650>

The Entered and Modified fields are not yet in a 'real' date format. So these must be coverted before we can use them.

In [14]:
bugsdf['Entered'] = pd.to_datetime(bugsdf.Entered)
bugsdf['Modified'] = pd.to_datetime(bugsdf.Modified)

Which are the oldest open bugs?

The - before the bugsdf.Status.isin reverses the result, so it behave like 'is not in'.

In [15]:
bugsdf[-bugsdf.Status.isin(['Closed', 'Duplicate', 'Won\'t fix', 'Can\'t repeat'])].sort(columns='Entered')[:5]
Out[15]:
ID Entered Modified Type Status Severity Version OS Summary Assign First Assign Last major_version
4 68571 2013-03-05 09:08:23 2014-02-26 09:41:24 Monitoring: Advisors/Rules Verified S3 NaN Microsoft Windows (Win7 SP1) False positive for 32-binary check on win64 NaN NaN NaN
5 68574 2013-03-05 13:42:26 2014-02-23 11:25:48 Server: InnoDB Verified S3 5.5.30, 5.6.16 Microsoft Windows (win7 sp1 ent) No instrumentation for InnoDB files in P_S on ... Marc Alff 5.5
19 68925 2013-04-11 05:57:29 2013-04-11 06:15:40 Client Verified S3 5.6.10, 5.5.29 Any Compatibility issue with mysql history ("\040"... NaN NaN 5.6
22 69147 2013-05-05 10:07:16 2013-07-28 14:43:57 Server: I_S Verified S4 5.7 Any Variable defaults in information_schema NaN NaN 5.7
33 69223 2013-05-14 06:22:48 2013-06-24 20:03:16 Server: DDL Verified S4 5.6.10 Any Give a warning on CREATE TABLE without PRIMARY... NaN NaN 5.6

Now let's find bugs which are open and not modified in a some time.

In [17]:
bugsdf[-bugsdf.Status.isin(['Closed', 'Duplicate', 'Won\'t fix', 'Can\'t repeat'])] \
      [bugsdf.Modified < datetime(2013, 6, 1)] \
      [bugsdf.Modified > datetime(1970, 1, 1)] 
Out[17]:
ID Entered Modified Type Status Severity Version OS Summary Assign First Assign Last major_version
19 68925 2013-04-11 05:57:29 2013-04-11 06:15:40 Client Verified S3 5.6.10, 5.5.29 Any Compatibility issue with mysql history ("\040"... NaN NaN 5.6
35 69226 2013-05-14 10:31:09 2013-05-14 16:09:09 Server: Options Verified S3 5.6.11 Any Status variable for SSL/TLS implementation NaN NaN 5.6
36 69314 2013-05-24 21:33:48 2013-05-25 07:55:24 Server: Options Verified S3 5.7.1 Any ignore-db-dir option doesn't work for database... NaN NaN 5.7

Besides searching for bugs by reporter you can also use this to search on other conditions.