Daniël's Database Blog

Posts

Using a parser plugin for improved search results with MySQL 5.7 and InnoDB.

With Unicode it is possible for strings to look the same, but with slight differences in which codepoints are used. For example the é in Café can be <U+0065 U+0301> or <U+00E9>. The solution is to use Unicode normalization, which is supported in every major programming language. Both versions of Café will be normalized to use U+00E9. In the best situation the application inserting data into the database will do the normalization, but that often not the case. This gives the following issue: If you search for Café in the normalized form it won't return non-normalized entries. I made a proof-of-concept parser plugin which indexes the normalized version of words. A very short demo: mysql> CREATE TABLE test1 (id int auto_increment primary key, -> txt TEXT CHARACTER SET utf8mb4, fulltext (txt)); Query OK, 0 rows affected (0.30 sec) mysql> CREATE TABLE test2 (id int auto_increment primary key, -> txt TEXT CHARACTER SET utf8mb4, fulltext (txt) WITH P...

A MySQL UDF written in Go

I was wondering if it is possible to write a MySQL User Defined Function (UDF) in Go. So I tried and I got a very basic UDF working. mysql> SELECT udf_fileexists_go("/etc/hosts"); +---------------------------------+ | udf_fileexists_go("/etc/hosts") | +---------------------------------+ | 1 | +---------------------------------+ 1 row in set (0.00 sec) mysql> SELECT udf_fileexists_go("/nonexistend"); +-----------------------------------+ | udf_fileexists_go("/nonexistend") | +-----------------------------------+ | 0 | +-----------------------------------+ 1 row in set (0.00 sec) This is nowhere near production quality, so be careful. The code is here: https://github.com/dveeden/udf_fileexists_go/blob/master/udf_fileexists_go.go .

Importing the Unicode Character Database in MySQL

In Python it is easily possible to findout the name of a Unicode character and findout some properties about that character. The module which does that is called unicodedata . An example: >>> import unicodedata >>> unicodedata.name( '☺' ) 'WHITE SMILING FACE' This module uses the data as released in the UnicodeData.txt file from the unicode.org website. So if UnicodeData.txt is a 'database', then we should be able to import it into MySQL and use it! I wrote a small Python script to automate this. The basic steps are: Download UnicodeData.txt Create a unicodedata.ucd table Use LOAD DATA LOCAL INFILE to load the data This isn't difficult especially because the file doesn't have the actual characters in it. It is just an ASCII file with codepoints and the data related to it. But it would be useful to have those characters in the database. So what I did is to create those with CONVERT(UNHEX(value) USING utf32 . What that does...

Inserting large rows in MySQL and MariaDB

As the maximum storage size for a LONGBLOB in MySQL is 4GB and the maximum max_allowed_packet size is 1GB I was wondering how it is possible to use the full LONGBLOB. So I started testing this. I wrote a Python script with MySQL Connector/Python and used MySQL Sandbox to bring up an instance of MySQl 5.6.25. One of the first settings I had to change was max_allowed_packet, which was expected. I set it to 1GB. The next setting was less expected, it was innodb_log_file_size. The server enforces that the transaction has to fit in 10% of the InnoDB log files. So I had to set it to 2 files of 5G to be able to insert one record of (almost) 1GB. So that worked for a row of a bit less that 1GB, this is because there is some overhead in the packet and the total has to fit in 1GB. For the next step (>1GB) I switched from Python to C so I could use mysql_stmt_send_long_data() which allows you to upload data in multiple chunks. I expected that to work, but it didn't. This is bec...

CIS Oracle MySQL Community Server 5.6 Benchmark v1.0.0

Today we released The CIS Oracle MySQL Community Server 5.6 Benchmark v1.0.0 This is a benchmark which can be used to secure your MySQL installation.

MySQL meetup in Amsterdam on January 15

The next meetup of the MySQL User Group NL is in Amsterdam in one of the offices of Booking.com . Schedule: • 18:30 - 19:00: Welcome • 19:00 - 19:30: Easy Replication Hierarchy Management with Pseudo-GTID (by Shlomi Noach) • 19:30 - 20:00: Food & Discussion • 20:00 - 20:30: Entry level Sphinx Search (by Art van Scheppingen) • 20:30 - 22:00: Discussion & Drinks For more information and to RSVP you can visit the event page of MySQL User Group NL website

Different SSL setups for MySQL

In this blog post I will describe different ways of using SSL with the MySQL database server. What does SSL give you? You might use MySQL replication over the internet or connect to MySQL over the internet. Another posibility is that you connect over an enterprise network to which just too many people have access. This is especially an issue if you use an BYOD network. SSL helps here by encrypting the network traffic to prevent against evesdropping. It also validates that you're talking to the correct server to prevent man-in-the-middle attacks. And you can also use SSL client certificates together with an password as two factor authentication. SSL is not the only option, you could use SSH and many MySQL GUI clients like MySQL Workbench support this. But you can't easily use SSH with a python script or mysqldump. Things that could go wrong Using SSL is almost always better than not using SSL at all. So there is not much you could do wrong. But there are a few things t...