Thursday, February 27, 2025

Why TLS for MySQL is difficult

The internet has changed to a place where most protocols like HTTP etc now use secure connections with TLS by default.

While both HTTP and the MySQL Protocol use TLS for secure connections there are still many differences which make it difficult for MySQL to benefit from the same advancements as HTTP has seen in the last so many years.

What is TLS?

TLS stands for Transport Layer Security and is the successor of SSL (Socket Layer Security). SSL and TLS are often used interchangably, while this isn’t correct strictly speaking. What TLS provides is a standardized way to encrypt in transit traffic and authenticate the other end of the connection. TLS when used together with HTTP is known as HTTPS, for secure HTTP.

How TLS works in HTTPS

The client (webbrowser) connects to a server on port 443. Then negitiation is done to agree on what encryption method is to be used. The server presents the client with a certificate, which the client then verifies against the system trust store.

For this to work the server needs to have a certificate that is signed by a cerfificate authority (CA) that is known to the client.

The CA will sign the certificate after doing some checks. Nowadays this is often a fully automated process. There are multiple ways to prove ownership of the domain to the CA. You can put a special file on the webserver or add a DNS record. See ACME for details.

What this provides is that the server authenticates itself to the client and that the connection is secure.

How TLS works with MySQL

The client (MySQL Client, etc) connects to the server. When both the client and server set the CLIENT_SSL flag then the connection switches to TLS. So the TLS part doesn’t start at the beginning of the conversation as is the case with HTTPS. This allows the MySQL protocol to do both TLS and non-TLS connections on the same port, where HTTP and HTTPS use two ports (80 and 443).

Then the usual negitioation etc happens, but the server certificate is likely not signed by one of the CA’s in the system truststore, so the user might have to specify the certificate of the CA when setting up the connection. And when the user doesn’t specify a CA certificate the client might not verify the certificate of the server against a CA.

So for the server part there are multiple options: - MySQL can now generate a certificate on it’s own. But it won’t be signed by a proper CA, instead it is self signed. - You could setup your own CA and have that sign the certificate. For example with Lemur (From Netflix) or with a more bare bones OpenSSL setup as described here. - Get certificates from a proper CA, just like you would with HTTPS.

Both HTTPS and the MySQL protocol allow mutual-TLS or mTLS where not only the server authenticates to the client, but also the clients presents a certificate to the server. However this is much more common with MySQL.

Getting certificates for internal hostnames from a proper CA is hard

The common challenge types for Let’s Encrypt are all dependent on having a server that has a public DNS record. And while Let’s Encrypt supports wildcard certifictes, these are usually less secure as they are (obviously) less restrictive on hostname and are more valuable when they end up in the wrong hands. So the best practice is to not use wildcard certificates. Other CA often have similar issues.

You could have a public DNS entry for your database server, but not allow any direct access from the internet, but this does expose the hostname of your server to the internet, which might be a problem for some.

And even if you’re ok with a public DNS entry, this means that automating this is more difficult than setting up certbot with Nginx, Apache, etc. as those often rely on publishing a challenge token on a HTTP endpoint. One option is to use certbot --standalone, which provides it’s own builtin webserver for the challenge token, but this requires the server to have port 80 open to the internet.

It is common to have database servers with only internal IP’s (192.168.x.x, etc), which also makes this harder. And the limited supply of IPv4 addresses globally also doesn’t help, however the “easy” solution for that is IPv6.

My opinion on this: - Small organizations should rely on Let’s Encrypt or something similar and have database servers with a public DNS entry and a public IP and then rely on strict firewall rules for security. - Bigger organizations should consider to setup their own CA, including automated certificate renewal.

This resuls in insecure defaults

The MySQL client, MySQL Shell and most connecteors have a default ssl-mode that is set to PREFERRED. This means it will use TLS if available, but it won’t validate the certificate or the hostname.

SSL Mode	Encrypted communication	Prevents against downgrade	Prevents against MitM
DISABLED	❌	N/A	❌
PREFERRED	If available	❌	❌
REQUIRED	✅	✅	❌
VERIFY_CA	✅	✅	Mostly
VERIFY_IDENTITY	✅	✅	✅

With VERIFY_CA the server needs a valid certificate, but the hostname of the server and that of the certificate doesn’t have to match. This means that a stolen certificate and key for dev.example.com could be used to do a MitM attack on proddb1.example.com.

Note that with curl comes with a more secrue default, which is similar to VERIFY_IDENTITY. It allows you to go connect to an insecure server with --insecure. This is only possible because webservers are expected to have CA signed certificates.

Note that with the caching_sha2_password authentiction, which is now the default the password is send in plain text over TLS. So a MitM attack could reveal this password to an attacker.

As the MySQL connections are less likely to go over the public internet and as servers are less likely to have a certificate that can be easily verified without additional configuration this behavor seems reasonable to me. However I think making this more secure by default would be a good step to take.

The handshake makes it difficult to use some TLS offloading products

As the MySQL Protocol first needs the exchange of some packets before the TLS connection is setup.

This makes it more difficult for proxies/routers/loadbalancers/etc to support MySQL as they also need to implement a small subset of the MySQL protocol.

Note that OpenSSL can do this with openssl s_client --starttls mysql ....

Long lived and stateful connections

With HTTPS you may have a persistent connection, but if it fails you can just retry the connection and things are usually fine as the session is not directly linked to the connection. And connections usually don’t stay open for long.

With MySQL the connection is linked to the session. And there may be local variables, ongoing transactions etc that are linked to this session. So if the connection is gone, these are all gone and you have to re-authenticate and retry your transaction and set any variables that you modified before. Also it isn’t uncommon to have long running sessions for things like ALTER TABLE or heavy reporting etc.

This makes it more difficult to change the X.509 certificate and key that the server uses as just restarting would distrupt users.

How MySQL made TLS easier to use

There used to be many more issues with TLS but Oracle MySQL has been working hard over the last few years to improve things.

So what things improved? I’ll list a few:

Clients and connectors where changed to use TLS by default. This isn’t a perfect situation yet, but it is better for security than not using TLS at all.
MySQL Community Edition was changed to use OpenSSL instead of YaSSL. This is mostly a licensing thing. This helps a lot for performance and it also brought support for TLSv1.2 and TLSv1.3.
First mysql_ssl_rsa_setup was introduced to generate TLS certificates and a RSA keypair. And eventually this was integrated into the server. Especially for development setups etc this makes this a lot easier.
Support for reloading certificates, keys, etc without server restart.

Tuesday, February 18, 2025

Decoding MySQL's GTID_TAGGED_LOG_EVENT

This is a follow-up for MySQL GTID tags and binlog events, but you don’t need to read that first.

One of the recent innovations in MySQL was the addition of Tagged GTID’s. These tagged GTID’s take the format of <uuid>:<tag>:<transaction_id>.

And this change means that the GTID_LOG_EVENT’s in the binary logs needed to be changed. The MySQL team at Oracle decided to not change the existing format, but introduce a new event: GTID_TAGGED_LOG_EVENT.

Initially I assumed that decoding the new event would me mostly identical to the original event, but with just a single field added. But this isn’t the case as Oracle MySQL deciced to use a new serialization format (Yes, more innovation) and use it for this new event. The new serialization format is documented here.

Oracle MySQL did their best to document the library and the format, but I guess the target audience is people using this library to encode and decode the format and not people working on implementations in different languages.

In this blog I’ll try to demonstrate my knowledge of the format that I gained by experimenting with it.

The first useful tool is mysqlbinlog --hexdump -vvv, which can read files directly or read from a remote server. This will decode the events and also print a hexdump of the event.

# at 158
#250217 21:14:49 server id 1  end_log_pos 240 CRC32 0xce6c39ef
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e b9 98 b3 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00
# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08  00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12  c3 02 0b ef 39 6c ce    |.....I......9l.|
#   GTID    last_committed=0    sequence_number=1   rbr_only=no original_committed_timestamp=1739823289369365   immediate_commit_timestamp=1739823289369365 transaction_length=210
# original_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
# immediate_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
/*!80001 SET @@session.original_commit_timestamp=1739823289369365*//*!*/;
/*!80014 SET @@session.original_server_version=90200*//*!*/;
/*!80014 SET @@session.immediate_server_version=90200*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:foobaz:1'/*!*/;

This event has the same header as all binlog events and on the end it has a CRC32 checksum.

The header:

# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e b9 98 b3 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00

The checksum:

                                              CRC32 0xce6c39ef
...
# 000000e1                                   ef 39 6c ce    |.....I......9l.|

If we strip the header and checkum away we end up with the event body:

# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08  00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12  c3 02 0b                |.....I......9l.|

The mysql::serialization format is only used for the body.

The message in the body is encoded like this:

Value	Decoded	Reference	Meaning / Notes
02	1	serialization_version_number
76	59	serializable_field_size	This is the length of the body (the event body has 3 lines of 16 bytes and one line of 11 bytes)
00	0	last_non_ignorable_field_id	id of the last required field
00	0	field id
02	1	gtid_tags
02	1	field id
25 02 dc f0 09 02 30 f9 03 22 bd 03 ad 02 21 02 44 44 5a 68 51 03 22	896e7882-18fe-11ef-ab88-22222d34d411	uuid	16 bytes of a UUID
04	2	field id
04	2	rpl_gtid_gno	This is the transaction id
06	3	field id
0c 66 6f 6f 62 61 7a	foobaz	tag	first byte stores the length of the string
08	4	field id
00	0	last_committed
0a	5	field id
04	2	sequence_number
0c	6	field id
7f 15 83 22 2d 2d 5c 2e 06	1739823289369365	immediate_commit_timestamp	2025-02-17 21:14:49.369365 CET
10	8	field id
49 03	210	transaction_length
12	9	field id
c3 02 0b	90200	immediate_server_version

The first thing to notice here is that fields are numbered. And as you can see field 7 is skipped.

For both encoding and decoding you need to know the field definition, which for this event can be found here. The data types and names of the fields are not encoded in the message.

Most numbers can be decoded with a shift operation: 0x04>>1 = 2.

Note that you have to take care of byte order, so to decode c3 02 0b, you need to reverse them first. And then 0x0b02c3>>3 = 90200.

But how do you know by how many bytes you have to shift? You need to look at the first byte, so for c3 02 0b, that’s c3. And then you look at the bit representation: 11000011. If you count the number of consecutive 1’s from right to left this gives you 2. And then you know that there are two more bits following (02 0b) and that you have to shift by 2+1=3.

And for the decoding of the UUID:

Encoded	Decoded
25 02	89
dc	6e
f0	78
09 02	82
30	18
f9 03	fe
22	11
bd 03	ef
ad 02	ab
21 02	88
44	22
44	22
5a	2d
68	34
51 03	d4
22	11

This gives you 896e788218fe11efab8822222d34d411, which then formatted as a UUID looks like this: 896e7882-18fe-11ef-ab88-22222d34d411.

The second byte is 02 if the value is in the range of 0x80 - 0xc and 03 if it is in the range of 0xc - 0xff.

So for the first value: 25 02: (0x25>>2)+0x80 = 89.

And the second value: dc: 0xdc>>1 = 6e.

I hope this helps others that want to decode or encode these formats and can’t use the library that MySQL provides.

Monday, February 17, 2025

The Potential of Query Attributes in MySQL

Introduction

Query Attributes are a relatively new feature of the MySQL Protocol. This is availble since MySQL 8.0.23.

Before Query Attributes were introduced there already was already another similar feature: Connetion Attributes, which provides per-connection metadata in the form of key/value pairs.

This is what connection attributes look like:

mysql> SELECT ATTR_NAME, ATTR_VALUE
    -> FROM performance_schema.session_connect_attrs
    -> WHERE PROCESSLIST_ID=64;

+------------------+---------------------+
| ATTR_NAME        | ATTR_VALUE          |
+------------------+---------------------+
| _platform        | amd64               |
| _runtime_version | go1.23.4            |
| _client_version  | (devel)             |
| _client_role     | binary_log_listener |
| _client_name     | go-mysql            |
| _os              | linux               |
+------------------+---------------------+
6 rows in set (0.00 sec)

Connection attributes are set by connectors (with a name that is prefixed by _) and applications.

These connection attributes are very useful to get insight into what is connecting to your server. However there are some limitations due to these being linked to the connection and being send during connection establishment. So if a connection is used by a connection pool then an attribute about which page is being rendered doesn’t work as the same connection is used by the application to render multiple pages.

And this is where Query Attributes come into play. When client and server set the CLIENT_QUERY_ATTRIBUTES flag then some of the MySQL network packets are slightly changed. This allows connectors and applications to set key/value pairs on a per-query basis. These don’t replace connection attributes.

Changes in network packets

COM_STMT_EXECUTE

The changes for COM_STMT_EXECUTE are small.

This command is used to execute a previously prepared statement.

The first change is that COM_STMT_EXECUTE can now send a parameter_count to override the number of expected parameters. And the second change is that parameters can now have a parameter_name.

Basically the query attributes are just regular parameters for prepared statements, but they have a name.

COM_QUERY

For COM_QUERY the changes look much more extesive. This is because COM_QUERY used to be very simple as it only was sending a query string and now it has much more structure.

The changes basically let you send parameters with COM_QUERY, just as you would do with COM_STMT_EXECUTE.

Example

This is how a query with two query attributes look like in Wireshark and in the client.

mysql-9.2.0> query_attributes traceid 12345 workload batch
mysql-9.2.0> SELECT mysql_query_attribute_string('traceid') TraceID,
    -> mysql_query_attribute_string('workload') Workload;

+---------+----------+
| TraceID | Workload |
+---------+----------+
| 12345   | batch    |
+---------+----------+
1 row in set (0.00 sec)

Changes in the C API

mysql_bind_param() was introduced with MySQL 8.0.23 to allow you to send query attributes with regular queries.

mysql_stmt_bind_named_param() was introduced with MySQL 8.2.0 (but with a bug that made it unusable) and this allows you to set the query attributes for prepared statements.

Changes in other conectors

For MySQL Connector/Python MySQLCursor.add_attribute() and a few other functions were added.

And MySQL Conector/J got the ability to use setAttribute() on statements.

Changes in the MySQL Client

You can now use query_attributes <key> <value> [<key> <value> ...] to set query attributes for the queries that you’re executing with the MySQL Client.

Current use

So for what kind of things could you use Query Attributes?

You might want to send a “Trace ID” if you’re using OpenTelemetry or a similar system to allow you to more easily link traces between systems.
If you’re doing any kind of proxying you may want to set the user the request is for. Just like the Forwarded header in HTTP.
Maybe include the page name, request ID, or session ID if you’re rendering web pages.
If you’re a connector, you may want to add the filename and line the request is coming from.

Also it seems that MySQL Router can use query attributes in the configuration for read/write splitting.

Another common way to send metadata along with a query is to use comments. However this makes it difficult to extract, parse and handle them programatically. And depending on whether you add them in the beginning or the end of your query it might make it difficult to read the queries in dashboards etc.

Limitations

The query attributes are now visible in the protocol and can be queried with the mysql_query_attribute_string() function.

However they don’t yet end up in the binlogs, even with binlog_rows_query_log_events set.

They are also not yet in the general log or the slow query log.

Future possibilities

Building a string in your application and executing that directly opens up the risk of SQL Injection. The safer way is to use prepared statements and bind the parameters that you want to execute. But the drawback of prepared statements is that it adds a network roundtrip, which adds latency. So for this many connectors emulate the prepared statement in the connector to avoid this extra roundtrip. However that adds complexity in the connectors, and this has to be done for each of the connectors.

With the enhancements for COM_QUERY it is now possible to send a query with a set of parameters in a single command. This might make it possible in the future to have a safe way to execute queries without having complexity in the connector or having to accept the added latency of an extra roundtrip.

Other resources

Passing query attributes through the MySQL protocol at FOSDEM 2021