Sunday, June 24, 2012

That's not my name! A story about character sets

When computers were still using large black text oriented screens or no screens at all, a computer only knew how to store a limited set of characters. Then it was normal to store a name with the more complicated characters replaced by more basic characters. The ASCII standard was used to make communication between multiple systems (or applications) easier. Storing characters as ASCII needs little space and is quite strait forward.

Then DOS used CP850 and CP437 and so on to make it possible to use language /location specific characters.
Then ISO8859-1, ISO8859-15 and more of these character sets were defined as standard.

And now there is Unicode: UTF-8, UTF-16, UCS2, etc. which allow you to store many different kinds of characters in the same character set.

But all those character sets only work correctly if you configure all applications correctly. Many of the character sets are very similar and seem to work correctly even if one of the systems is not correctly configured. If this happens most characters will be correct except the special ones.  And my name does contain a 'special' character, the 'ë'.

Below is a picture of two letters I received in the mail recently:





So what went wrong?

This is called Mojibake.

The first one:

mysql> CREATE TABLE t1(v1 VARCHAR(200)) ENGINE=InnoDB DEFAULT CHARACTER SET utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> SHOW SESSION VARIABLES LIKE '%char%';
+--------------------------+----------------------------------------------------+
| Variable_name            | Value                                              |
+--------------------------+----------------------------------------------------+
| character_set_client     | utf8                                               |
| character_set_connection | utf8                                               |
| character_set_database   | latin1                                             |
| character_set_filesystem | binary                                             |
| character_set_results    | utf8                                               |
| character_set_server     | latin1                                             |
| character_set_system     | utf8                                               |
| character_sets_dir       | /home/dveeden/mysql/5.5.22-mariadb/share/charsets/ |
+--------------------------+----------------------------------------------------+
8 rows in set (0.00 sec)

mysql> INSERT INTO t1 VALUES('Daniël van Eeden');
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM t1;
+-------------------+
| v1                |
+-------------------+
| Daniël van Eeden  |
+-------------------+
1 row in set (0.00 sec)

mysql> set session character_set_client=latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> set session character_set_connection=latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 VALUES('Daniël van Eeden');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t1;
+---------------------+
| v1                  |
+---------------------+
| Daniël van Eeden    |
| Daniël van Eeden   |
+---------------------+
2 rows in set (0.00 sec)

mysql> INSERT INTO t1 VALUES('Daniël van Eeden');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t1;
+---------------------+
| v1                  |
+---------------------+
| Daniël van Eeden    |
| Daniël van Eeden   |
| Daniël van Eeden   |
+---------------------+
3 rows in set (0.00 sec)

So we can reproduce this issue by setting the client and connection charset to latin1.

mysql> SELECT v1,HEX(v1) FROM t1;
+---------------------+----------------------------------------+
| v1                  | HEX(v1)                                |
+---------------------+----------------------------------------+
| Daniël van Eeden    | 44616E69C3AB6C2076616E20456564656E     |
| Daniël van Eeden   | 44616E69C383C2AB6C2076616E20456564656E |
| Daniël van Eeden   | 44616E69C383C2AB6C2076616E20456564656E |
+---------------------+----------------------------------------+
3 rows in set (0.00 sec)

mysql> SELECT CONVERT(X'C3AB' USING utf8),CONVERT(X'C3AB' USING latin1);
+-----------------------------+-------------------------------+
| CONVERT(X'C3AB' USING utf8) | CONVERT(X'C3AB' USING latin1) |
+-----------------------------+-------------------------------+
| ë                           | ë                            |
+-----------------------------+-------------------------------+
1 row in set (0.00 sec)

The ë is stored as C383 C2AB and is rendered as two latin1 characters.

unicode C3 83 (Ã) is latin1 C3
unicode C2 AB («) is latin1 AB
C3AB = unicode ë

charutf8latin1cp850
ëC3 ABEB89
ÃC3 83C3C7
«C2 ABABAE
ÙC3 99D9EB

mysql> SELECT CONVERT(X'C3AB' USING latin1);
+-------------------------------+
| CONVERT(X'C3AB' USING latin1) |
+-------------------------------+
| ë                            |
+-------------------------------+
1 row in set (0.00 sec)

The sender of the first letter is probably storing UTF-8 from the web application in a database, but uses Latin1 when sending the letters.

The second letter renders the ë as Ù.

After some tries it seems like the sender is storing the ë as EB (latin1) and rendering it as CP850.

mysql> SELECT CONVERT(X'EB' USING cp850);
+----------------------------+
| CONVERT(X'EB' USING cp850) |
+----------------------------+
| Ù                          |
+----------------------------+
1 row in set (0.00 sec)

Anders Karslsson recently wrote a nice introduction about Unicode. And Ronald Bradford wrote a blogpost about UTF-8 With MySQL and LAMP.

Sunday, June 3, 2012

XA Transactions between TokuDB and InnoDB

The recently released TokuDB brings many features. One of those features is support for XA Transactions. InnoDB already has support for XA Transactions.

XA Transactions are transactions which span multiple databases and or applications. XA Transactions use 2-phase commit, which is also the same method which MySQL Cluster uses.

Internal XA Transactions are used to keep the binary log and InnoDB in sync.

Demo 1: XA Transaction on 1 node:
mysql55-tokudb6> XA START 'demo01';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> INSERT INTO xatest(name) VALUES('demo01');
Query OK, 1 row affected (0.01 sec)

mysql55-tokudb6> SELECT * FROM xatest;
+----+--------+
| id | name   |
+----+--------+
|  3 | demo01 |
+----+--------+
1 row in set (0.00 sec)

mysql55-tokudb6> XA END 'demo01';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> XA PREPARE 'demo01';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> XA COMMIT 'demo01';
Query OK, 0 rows affected (0.00 sec) 
This show a transaction with a transaction ID of 'demo01'.    

XA START Starts the transaction and puts it in ACTIVE state.  
XA END Ends the transaction and puts it IDLE state.
XA PREPARE Will prepare the transaction and puts it in PREPARED state.  
Then XA COMMIT can be used to COMMIT the transaction or XA ROLLBACK can be used to rollback the transaction.    

Demo 2: XA Transaction between 2 nodes:
mysql55-tokudb6> XA START 'tr01';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA START 'tr01';
Query OK, 0 rows affected (0.01 sec)

mysql55-tokudb6> INSERT INTO xatest(name) VALUES('tr01');
Query OK, 1 row affected (0.00 sec)
 
mysql56-innodb> INSERT INTO xatest(name) VALUES('tr01');
Query OK, 1 row affected (0.00 sec)

mysql55-tokudb6> XA END 'tr01';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA END 'tr01';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> XA PREPARE 'tr01';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA PREPARE 'tr01';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> XA COMMIT 'tr01';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA COMMIT 'tr01';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> SELECT * FROM xatest;
+----+------+
| id | name |
+----+------+
|  1 | tr01 |
+----+------+
1 row in set (0.00 sec)

mysql56-innodb> SELECT * FROM xatest;
+----+------+
| id | name |
+----+------+
|  1 | tr01 |
+----+------+
1 row in set (0.00 sec) 
 
Demo 3: XA Transaction with rollback:
mysql55-tokudb6> XA START 'tr02';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA START 'tr02';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> INSERT INTO xatest(name) VALUES('tr02');
Query OK, 1 row affected (0.00 sec)
 
mysql56-innodb> INSERT INTO xatest(name) VALUES('tr02');
Query OK, 1 row affected (0.00 sec)

mysql55-tokudb6> XA END 'tr02';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA END 'tr02';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> XA PREPARE 'tr02';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA PREPARE 'tr02';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> XA ROLLBACK 'tr02';
Query OK, 0 rows affected (0.00 sec)

mysql56-innodb> XA ROLLBACK 'tr02';
Query OK, 0 rows affected (0.00 sec)
 
mysql56-innodb> XA ROLLBACK 'tr02';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> SELECT * FROM xatest;
+----+------+
| id | name |
+----+------+
|  1 | tr01 |
+----+------+
1 row in set (0.00 sec)

mysql56-innodb> SELECT * FROM xatest;
+----+------+
| id | name |
+----+------+
|  1 | tr01 |
+----+------+
1 row in set (0.00 sec)



 
 
These transactions are between two MySQL instances: One with InnoDB and one with TokuDB. It's possible to run TokuDB and InnoDB both in one database instance, but separating them between instances (and hosts) might be needed for performance or some other reason.

It's possible to run a transaction with TokuDB and a PostgreSQL database:
mysql55-tokudb6> XA START 'tr03';
Query OK, 0 rows affected (0.00 sec)

xatest=# BEGIN;
BEGIN

mysql55-tokudb6> INSERT INTO xatest(name) VALUES('tr03');
Query OK, 1 row affected (0.00 sec)

mysql55-tokudb6> DELETE FROM xatest WHERE name='tr02';
Query OK, 0 rows affected (0.00 sec)

xatest=# INSERT INTO xatest(name) VALUES('tr03');
INSERT 0 1
mysql55-tokudb6> XA END 'tr03';
Query OK, 0 rows affected (0.00 sec)

mysql55-tokudb6> XA PREPARE 'tr03';
Query OK, 0 rows affected (0.00 sec)
xatest=# PREPARE TRANSACTION 'tr03';
PREPARE TRANSACTION
mysql55-tokudb6> XA COMMIT 'tr03';
Query OK, 0 rows affected (0.00 sec)

xatest=# COMMIT PREPARED 'tr03';
COMMIT PREPARED

For PostgreSQL this only works if max_prepared_transactions is set to a non-zero value.

Transactions can not only run between databases but applications, filesystems and many other components can also be a member of the transactions. They need to support two-phase commit.

An XA Transaction is coordinated by a transaction coordinator which can be an application on a application server.

Monday, May 28, 2012

IPv6 on database websites

After reading www.postgresq.org now active over IPV6 by default I quickly tried some other host to see what the current state of IPv6 is for some known database websites.

$ getent hosts mysql.com percona.com askmonty.org postgresql.org oracle.com sqlite.org code.openark.org skysql.com drizzle.org
156.151.63.6    mysql.com
74.121.199.234  percona.com
173.203.202.13  askmonty.org
2a02:16a8:dc51::50 postgresql.org
137.254.16.101  oracle.com
67.18.92.124    sqlite.org
69.89.31.240    code.openark.org
94.143.114.49   skysql.com
173.203.110.72  drizzle.org




So only postgresql.org supports IPv6 right now. On the MySQL side Facebook is one of the known IPv6 users.

Are there any IPv6  database websites I forgot to include?



The World IPv6 Launch is June 6 2012, so there are still a few days left to enable IPv6!

Thursday, April 26, 2012

Books vs. e-Books for DBA's

As most people still do I learned to read using books. WhooHoo!

Books are nice. Besides reading them they are also a nice decoration on your shelf. There is a brilliant TED talk by Chip Kidd on this subject.

But sometimes books have drawbacks. This is where I have to start the comparison with vinyl records (Yes, you're still reading a database oriented blog). Vinyl records look nice and are still being sold and yes I also still use them. The drawback is that car dealers start to look puzzeled if you ask them if your new multimedia system in your car is able to play your old Led Zeppelin records. The market for portable record players is small, and that's for a good reason.

The problem with books about databases is that they get old very soon. The MySQL 5.1 Cluster Certification Study Guide was printed by lulu.com which made it possible to quickly update the material. This made sure that the material wasn't outdated when you bought it.

I like to use books as reference material, but I tend to use Google more often and the books stay on the bookshelf and are getting old and dusty. One of the reasons for this is that taking books with me just for reference is not an option judging by the weight of it.

At Percona Live UK I got a voucher from O'reilly to get a free e-Book. So I chose 'SQL and Relational Theory'. I started to read it on my laptop with FBreader and on my iPhone using Stanza. Both my phone and laptop are not really made for reading. So I bought an Sony Reader, which is made for reading.

Reading 'SQL and Relational Theory' on the Sony Reader is nice. The only annoyance is that the examples are like this:
SELECT COUNT(*) | SELECT COUNT(col1)
FROM tbl1       | FROM tbl1
And with line wrapping it looks like this:
SELECT COUNT(*) | 
SELECT COUNT(col1)
FROM tbl1       | 
FROM tbl1
Which is not very readable.
The book is very theoretical as you might expect, but nonetheless it's a very good read.

The Sony Reader is not very suitable for reading whitepapers in PDF format  as most whitepapers are in A4 or Letter format which is too big for the device. Of course software like Calibre can covert some of those.
(Oracle, Percona, others… please also publish your whitepapers in a format more suitable for an eReader)

The device itself is very nice. The battery time and e-Ink display are good (especially if you compare them with an tablet).

Unfortunately it doesn't increase my reading speed and it doesn't give me more time to read.

I'm looking forward to read some other database books in e-Book format.  I think the next one on my list is High Performance MySQL.

I planned to publish this post when I finished reading SQL and Relation Theory, but I thought now might be a better time as O'Reilly has a discount for that book and other books by C.J. Date.

The Sony Reader runs a modified Android (yes it's possible to root it to play angry birds on it). It also has a webbrowser, but itsn't well suited for reading Planet MySQL or Planet MariaDB. Using the webbrowser to download the MP3 for the OurSQL Podcast and then playing it does work flawlessly. I tried to download the EPUB file for the MySQL Reference Manual, but that failed so I used USB for that.

Sunday, April 22, 2012

SQL Injections, Again…

Last Friday the Dutch TV program Zembla aired part two of the "verzuimpolitie" series. The first part was mainly about how employers could access medical information about employees. There is a news article about the second part here (with google translate).

The second part is about the security of the IT system which is used to record medical information about employees. They give this information to the company to which the company they're working for is outsourcing everything related to workplace absenteeism.

After the first part of the series some viewer reported that the website contained SQL injections. The creators of the program verified this and tried to report it to VCD (The company which offers the software as a service). Then VCD called to police to remove them from the VCD office.


Then Zembla contacted the Radboud University and asked them to assist with this issue. The University verified the SQL Injection and confirmed that this was a serious security flaw. Then a VCD executive told Zembla that there wasn't a SQL Injection, someone just stole the passwords. This is strange because VCD reported to the University that they recorded a SQL Injection attack by the University.


The users of the VCD Humannet software were not informed. And when some of the companies using this SaaS service became aware of the security incident it took a lot of effort before the service was temporarily shutdown to prevent further harm.


This whole story reminded me of the situation around Comodo and DigiNotar. Comodo was hacked, stopped the issuing process, reported the issue and fixed it. Then DigiNotar was hacked, did not stop the issuing process. It also didn't report the issue. Then they became bankrupt.


The lessons learned for SQL Injections for DBA's and Application Developers:
1. Input validation. This is obvious.

2. Use prepared statements if possible.

3. Prepare for a security incident: make it easy to disable applications or parts of applications.
If all client companies are in the same database then it's very hard to shutdown the application for just one company. Using one database instance per client company might be a solution.


4. Use isolation
If there are 10 client companies and they all use different databases  as separation, then you should also use 10 application users with the correct permissions. Then a SQL injection for one customer won't affect other customers.


5. Use a database firewall.
This is not very common yet. You could use GreenSQL or McAfee (partly opensource). There are more solutions available, but these are at least partly opensource.

6. Use two factor authentication if dealing with sensitive data.
You don't have to buy expensive tokens. There are enough free or almost free solutions available. Yubikey is a possible solution.

7. Do not store passwords, store hashes.



8. Use encryption an function like AES_ENCRYPT() to encrypt sensitive data.
This could guard your data from 'curious' DBA's and other administrative users.
 Do not use a hardcoded password for this! Make sure that the AES_ENCRYPT doesn't end up in your binlogs, use a variable! And only use TLS secured connections. It might be better to encrypt the data in the application instead of in the database. It could even be possible to use client side encryption to encrypt the data in the browser.


9. Remove old authentication methods, login screens, etc.


The lessons learned for SQL Injections for management:
1. Security scans are mandatory. Companies like Madison Gurkha and Fox IT can offer this.

2. Don't only inclue your own services in security scans, but also the external services you use.

3. Make sure that there is a security breach notification requirement in the contracts for security sensitive services.

4. Make it easy to report security incidents.

5. Do shutdown the service if needed for security.

6. Do inform your customers about the security incident.

Sunday, April 15, 2012

Backup your sandbox with XtraBackup

Today I tried to make incremental backups of a MariaDB instance in a MySQL sandbox with Percona XtraBackup.
I used the recently released XtraBackup 2.0. And of course there is documentation about making incremental backups. 

MySQL sandbox makes it easy to run many different MySQL versions on one machine. It does this by changing the port number, data directory, UNIX socket location and a whole lot more.

So I first started with a full backup and after that I used that backup as a base for the incremental backups. To do that I had to specify the port number which is 5522 and the username and password for the msandbox account. As MySQL uses a UNIX socket instead of a TCP connection if the hostname is localhost I specified 127.0.0.1 as hostname to force a TCP connection. That worked!

Then I created the incremental backup by using the --incremental option and the --incremental-basedir option to specify the location of the full backup. That also worked!

Then I tried to make a backup while putting some load on my database. I did use  "INSERT INTO test1(col1) SELECT col1 FROM test1" to do this.

The full and incremental backups still worked, or at least that's what the backup script told me. But the size of the incremental backups was quite small. And I noticed that the LSN was very small and not increasing. The xtrabackup_checkpoints file also told me that the backups where all for exactly the same LSN. As the LSN is only for InnoDB, I verified the table type for my test tables. And my test tables were in fact InnoDB. A "SHOW ENGINE INNODB STATUS\G" told me that the LSN was in fact increasing.

It turned out that XtraBackup was making backups of /var/lib/mysql instead of ~/sandboxes/msb_5_5_22-mariadb/data/. Adding "--defaults-file=~/sandboxes/msb_5_5_22-mariadb/my.sandbox.cnf" to the innobackupex command did correct this.

After specifying the correct config file I did try to make backups under load again. It failed due to the logfiles being too small. So I stopped the database, removed the ib_logfile's and started the database with a larger InnoDB logfile size.

Then It all worked flawlessly!

So you should make sure that your backup completes without errors AND that your backups is from the right database. Of course testing restores regularly would also detect this.

MySQL DoS

There is a nice demo of  MySQL Bug 13510739 on Eric Romang's blog

I've published this blog to make this content available on planet.mysql.com.