Thursday, February 27, 2025

Why TLS for MySQL is difficult

The internet has changed to a place where most protocols like HTTP etc now use secure connections with TLS by default.

While both HTTP and the MySQL Protocol use TLS for secure connections there are still many differences which make it difficult for MySQL to benefit from the same advancements as HTTP has seen in the last so many years.

What is TLS?

TLS stands for Transport Layer Security and is the successor of SSL (Socket Layer Security). SSL and TLS are often used interchangably, while this isn’t correct strictly speaking. What TLS provides is a standardized way to encrypt in transit traffic and authenticate the other end of the connection. TLS when used together with HTTP is known as HTTPS, for secure HTTP.

How TLS works in HTTPS

The client (webbrowser) connects to a server on port 443. Then negitiation is done to agree on what encryption method is to be used. The server presents the client with a certificate, which the client then verifies against the system trust store.

For this to work the server needs to have a certificate that is signed by a cerfificate authority (CA) that is known to the client.

The CA will sign the certificate after doing some checks. Nowadays this is often a fully automated process. There are multiple ways to prove ownership of the domain to the CA. You can put a special file on the webserver or add a DNS record. See ACME for details.

What this provides is that the server authenticates itself to the client and that the connection is secure.

How TLS works with MySQL

The client (MySQL Client, etc) connects to the server. When both the client and server set the CLIENT_SSL flag then the connection switches to TLS. So the TLS part doesn’t start at the beginning of the conversation as is the case with HTTPS. This allows the MySQL protocol to do both TLS and non-TLS connections on the same port, where HTTP and HTTPS use two ports (80 and 443).

Then the usual negitioation etc happens, but the server certificate is likely not signed by one of the CA’s in the system truststore, so the user might have to specify the certificate of the CA when setting up the connection. And when the user doesn’t specify a CA certificate the client might not verify the certificate of the server against a CA.

So for the server part there are multiple options: - MySQL can now generate a certificate on it’s own. But it won’t be signed by a proper CA, instead it is self signed. - You could setup your own CA and have that sign the certificate. For example with Lemur (From Netflix) or with a more bare bones OpenSSL setup as described here. - Get certificates from a proper CA, just like you would with HTTPS.

Both HTTPS and the MySQL protocol allow mutual-TLS or mTLS where not only the server authenticates to the client, but also the clients presents a certificate to the server. However this is much more common with MySQL.

Getting certificates for internal hostnames from a proper CA is hard

The common challenge types for Let’s Encrypt are all dependent on having a server that has a public DNS record. And while Let’s Encrypt supports wildcard certifictes, these are usually less secure as they are (obviously) less restrictive on hostname and are more valuable when they end up in the wrong hands. So the best practice is to not use wildcard certificates. Other CA often have similar issues.

You could have a public DNS entry for your database server, but not allow any direct access from the internet, but this does expose the hostname of your server to the internet, which might be a problem for some.

And even if you’re ok with a public DNS entry, this means that automating this is more difficult than setting up certbot with Nginx, Apache, etc. as those often rely on publishing a challenge token on a HTTP endpoint. One option is to use certbot --standalone, which provides it’s own builtin webserver for the challenge token, but this requires the server to have port 80 open to the internet.

It is common to have database servers with only internal IP’s (192.168.x.x, etc), which also makes this harder. And the limited supply of IPv4 addresses globally also doesn’t help, however the “easy” solution for that is IPv6.

My opinion on this: - Small organizations should rely on Let’s Encrypt or something similar and have database servers with a public DNS entry and a public IP and then rely on strict firewall rules for security. - Bigger organizations should consider to setup their own CA, including automated certificate renewal.

This resuls in insecure defaults

The MySQL client, MySQL Shell and most connecteors have a default ssl-mode that is set to PREFERRED. This means it will use TLS if available, but it won’t validate the certificate or the hostname.

SSL Mode	Encrypted communication	Prevents against downgrade	Prevents against MitM
DISABLED	❌	N/A	❌
PREFERRED	If available	❌	❌
REQUIRED	✅	✅	❌
VERIFY_CA	✅	✅	Mostly
VERIFY_IDENTITY	✅	✅	✅

With VERIFY_CA the server needs a valid certificate, but the hostname of the server and that of the certificate doesn’t have to match. This means that a stolen certificate and key for dev.example.com could be used to do a MitM attack on proddb1.example.com.

Note that with curl comes with a more secrue default, which is similar to VERIFY_IDENTITY. It allows you to go connect to an insecure server with --insecure. This is only possible because webservers are expected to have CA signed certificates.

Note that with the caching_sha2_password authentiction, which is now the default the password is send in plain text over TLS. So a MitM attack could reveal this password to an attacker.

As the MySQL connections are less likely to go over the public internet and as servers are less likely to have a certificate that can be easily verified without additional configuration this behavor seems reasonable to me. However I think making this more secure by default would be a good step to take.

The handshake makes it difficult to use some TLS offloading products

As the MySQL Protocol first needs the exchange of some packets before the TLS connection is setup.

This makes it more difficult for proxies/routers/loadbalancers/etc to support MySQL as they also need to implement a small subset of the MySQL protocol.

Note that OpenSSL can do this with openssl s_client --starttls mysql ....

Long lived and stateful connections

With HTTPS you may have a persistent connection, but if it fails you can just retry the connection and things are usually fine as the session is not directly linked to the connection. And connections usually don’t stay open for long.

With MySQL the connection is linked to the session. And there may be local variables, ongoing transactions etc that are linked to this session. So if the connection is gone, these are all gone and you have to re-authenticate and retry your transaction and set any variables that you modified before. Also it isn’t uncommon to have long running sessions for things like ALTER TABLE or heavy reporting etc.

This makes it more difficult to change the X.509 certificate and key that the server uses as just restarting would distrupt users.

How MySQL made TLS easier to use

There used to be many more issues with TLS but Oracle MySQL has been working hard over the last few years to improve things.

So what things improved? I’ll list a few:

Clients and connectors where changed to use TLS by default. This isn’t a perfect situation yet, but it is better for security than not using TLS at all.
MySQL Community Edition was changed to use OpenSSL instead of YaSSL. This is mostly a licensing thing. This helps a lot for performance and it also brought support for TLSv1.2 and TLSv1.3.
First mysql_ssl_rsa_setup was introduced to generate TLS certificates and a RSA keypair. And eventually this was integrated into the server. Especially for development setups etc this makes this a lot easier.
Support for reloading certificates, keys, etc without server restart.

Tuesday, February 18, 2025

Decoding MySQL's GTID_TAGGED_LOG_EVENT

This is a follow-up for MySQL GTID tags and binlog events, but you don’t need to read that first.

One of the recent innovations in MySQL was the addition of Tagged GTID’s. These tagged GTID’s take the format of <uuid>:<tag>:<transaction_id>.

And this change means that the GTID_LOG_EVENT’s in the binary logs needed to be changed. The MySQL team at Oracle decided to not change the existing format, but introduce a new event: GTID_TAGGED_LOG_EVENT.

Initially I assumed that decoding the new event would me mostly identical to the original event, but with just a single field added. But this isn’t the case as Oracle MySQL deciced to use a new serialization format (Yes, more innovation) and use it for this new event. The new serialization format is documented here.

Oracle MySQL did their best to document the library and the format, but I guess the target audience is people using this library to encode and decode the format and not people working on implementations in different languages.

In this blog I’ll try to demonstrate my knowledge of the format that I gained by experimenting with it.

The first useful tool is mysqlbinlog --hexdump -vvv, which can read files directly or read from a remote server. This will decode the events and also print a hexdump of the event.

# at 158
#250217 21:14:49 server id 1  end_log_pos 240 CRC32 0xce6c39ef
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e b9 98 b3 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00
# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08  00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12  c3 02 0b ef 39 6c ce    |.....I......9l.|
#   GTID    last_committed=0    sequence_number=1   rbr_only=no original_committed_timestamp=1739823289369365   immediate_commit_timestamp=1739823289369365 transaction_length=210
# original_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
# immediate_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
/*!80001 SET @@session.original_commit_timestamp=1739823289369365*//*!*/;
/*!80014 SET @@session.original_server_version=90200*//*!*/;
/*!80014 SET @@session.immediate_server_version=90200*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:foobaz:1'/*!*/;

This event has the same header as all binlog events and on the end it has a CRC32 checksum.

The header:

# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e b9 98 b3 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00

The checksum:

                                              CRC32 0xce6c39ef
...
# 000000e1                                   ef 39 6c ce    |.....I......9l.|

If we strip the header and checkum away we end up with the event body:

# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08  00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12  c3 02 0b                |.....I......9l.|

The mysql::serialization format is only used for the body.

The message in the body is encoded like this:

Value	Decoded	Reference	Meaning / Notes
02	1	serialization_version_number
76	59	serializable_field_size	This is the length of the body (the event body has 3 lines of 16 bytes and one line of 11 bytes)
00	0	last_non_ignorable_field_id	id of the last required field
00	0	field id
02	1	gtid_tags
02	1	field id
25 02 dc f0 09 02 30 f9 03 22 bd 03 ad 02 21 02 44 44 5a 68 51 03 22	896e7882-18fe-11ef-ab88-22222d34d411	uuid	16 bytes of a UUID
04	2	field id
04	2	rpl_gtid_gno	This is the transaction id
06	3	field id
0c 66 6f 6f 62 61 7a	foobaz	tag	first byte stores the length of the string
08	4	field id
00	0	last_committed
0a	5	field id
04	2	sequence_number
0c	6	field id
7f 15 83 22 2d 2d 5c 2e 06	1739823289369365	immediate_commit_timestamp	2025-02-17 21:14:49.369365 CET
10	8	field id
49 03	210	transaction_length
12	9	field id
c3 02 0b	90200	immediate_server_version

The first thing to notice here is that fields are numbered. And as you can see field 7 is skipped.

For both encoding and decoding you need to know the field definition, which for this event can be found here. The data types and names of the fields are not encoded in the message.

Most numbers can be decoded with a shift operation: 0x04>>1 = 2.

Note that you have to take care of byte order, so to decode c3 02 0b, you need to reverse them first. And then 0x0b02c3>>3 = 90200.

But how do you know by how many bytes you have to shift? You need to look at the first byte, so for c3 02 0b, that’s c3. And then you look at the bit representation: 11000011. If you count the number of consecutive 1’s from right to left this gives you 2. And then you know that there are two more bits following (02 0b) and that you have to shift by 2+1=3.

And for the decoding of the UUID:

Encoded	Decoded
25 02	89
dc	6e
f0	78
09 02	82
30	18
f9 03	fe
22	11
bd 03	ef
ad 02	ab
21 02	88
44	22
44	22
5a	2d
68	34
51 03	d4
22	11

This gives you 896e788218fe11efab8822222d34d411, which then formatted as a UUID looks like this: 896e7882-18fe-11ef-ab88-22222d34d411.

The second byte is 02 if the value is in the range of 0x80 - 0xc and 03 if it is in the range of 0xc - 0xff.

So for the first value: 25 02: (0x25>>2)+0x80 = 89.

And the second value: dc: 0xdc>>1 = 6e.

I hope this helps others that want to decode or encode these formats and can’t use the library that MySQL provides.

Monday, February 17, 2025

The Potential of Query Attributes in MySQL

Introduction

Query Attributes are a relatively new feature of the MySQL Protocol. This is availble since MySQL 8.0.23.

Before Query Attributes were introduced there already was already another similar feature: Connetion Attributes, which provides per-connection metadata in the form of key/value pairs.

This is what connection attributes look like:

mysql> SELECT ATTR_NAME, ATTR_VALUE
    -> FROM performance_schema.session_connect_attrs
    -> WHERE PROCESSLIST_ID=64;

+------------------+---------------------+
| ATTR_NAME        | ATTR_VALUE          |
+------------------+---------------------+
| _platform        | amd64               |
| _runtime_version | go1.23.4            |
| _client_version  | (devel)             |
| _client_role     | binary_log_listener |
| _client_name     | go-mysql            |
| _os              | linux               |
+------------------+---------------------+
6 rows in set (0.00 sec)

Connection attributes are set by connectors (with a name that is prefixed by _) and applications.

These connection attributes are very useful to get insight into what is connecting to your server. However there are some limitations due to these being linked to the connection and being send during connection establishment. So if a connection is used by a connection pool then an attribute about which page is being rendered doesn’t work as the same connection is used by the application to render multiple pages.

And this is where Query Attributes come into play. When client and server set the CLIENT_QUERY_ATTRIBUTES flag then some of the MySQL network packets are slightly changed. This allows connectors and applications to set key/value pairs on a per-query basis. These don’t replace connection attributes.

Changes in network packets

COM_STMT_EXECUTE

The changes for COM_STMT_EXECUTE are small.

This command is used to execute a previously prepared statement.

The first change is that COM_STMT_EXECUTE can now send a parameter_count to override the number of expected parameters. And the second change is that parameters can now have a parameter_name.

Basically the query attributes are just regular parameters for prepared statements, but they have a name.

COM_QUERY

For COM_QUERY the changes look much more extesive. This is because COM_QUERY used to be very simple as it only was sending a query string and now it has much more structure.

The changes basically let you send parameters with COM_QUERY, just as you would do with COM_STMT_EXECUTE.

Example

This is how a query with two query attributes look like in Wireshark and in the client.

mysql-9.2.0> query_attributes traceid 12345 workload batch
mysql-9.2.0> SELECT mysql_query_attribute_string('traceid') TraceID,
    -> mysql_query_attribute_string('workload') Workload;

+---------+----------+
| TraceID | Workload |
+---------+----------+
| 12345   | batch    |
+---------+----------+
1 row in set (0.00 sec)

Changes in the C API

mysql_bind_param() was introduced with MySQL 8.0.23 to allow you to send query attributes with regular queries.

mysql_stmt_bind_named_param() was introduced with MySQL 8.2.0 (but with a bug that made it unusable) and this allows you to set the query attributes for prepared statements.

Changes in other conectors

For MySQL Connector/Python MySQLCursor.add_attribute() and a few other functions were added.

And MySQL Conector/J got the ability to use setAttribute() on statements.

Changes in the MySQL Client

You can now use query_attributes <key> <value> [<key> <value> ...] to set query attributes for the queries that you’re executing with the MySQL Client.

Current use

So for what kind of things could you use Query Attributes?

You might want to send a “Trace ID” if you’re using OpenTelemetry or a similar system to allow you to more easily link traces between systems.
If you’re doing any kind of proxying you may want to set the user the request is for. Just like the Forwarded header in HTTP.
Maybe include the page name, request ID, or session ID if you’re rendering web pages.
If you’re a connector, you may want to add the filename and line the request is coming from.

Also it seems that MySQL Router can use query attributes in the configuration for read/write splitting.

Another common way to send metadata along with a query is to use comments. However this makes it difficult to extract, parse and handle them programatically. And depending on whether you add them in the beginning or the end of your query it might make it difficult to read the queries in dashboards etc.

Limitations

The query attributes are now visible in the protocol and can be queried with the mysql_query_attribute_string() function.

However they don’t yet end up in the binlogs, even with binlog_rows_query_log_events set.

They are also not yet in the general log or the slow query log.

Future possibilities

Building a string in your application and executing that directly opens up the risk of SQL Injection. The safer way is to use prepared statements and bind the parameters that you want to execute. But the drawback of prepared statements is that it adds a network roundtrip, which adds latency. So for this many connectors emulate the prepared statement in the connector to avoid this extra roundtrip. However that adds complexity in the connectors, and this has to be done for each of the connectors.

With the enhancements for COM_QUERY it is now possible to send a query with a set of parameters in a single command. This might make it possible in the future to have a safe way to execute queries without having complexity in the connector or having to accept the added latency of an extra roundtrip.

Other resources

Passing query attributes through the MySQL protocol at FOSDEM 2021

Monday, December 2, 2024

MySQL GTID tags and binlog events

MySQL 8.4 and newer have extended the Global Transaction ID (GTID) functionality with a new “tag” option.

Refresher on GTID

A GTID is a unique ID that is assigned to a transaction. This is used if gtid_mode is set to ON. The benefit of this is that a transaction can be uniquely identified in a MySQL replication setup with multiple levels. Among others this makes it easier to refactor a replication tree as a MySQL replica knows which transactions it has seen and can use this to find the right position to start replicating from a new source.

The format of GTIDs is documented here.

Before GTID was used replication worked based on a file and offset (e.g. file=binlog.000001,offset=4), which is unique to every server.

A GTID without tag looks like this:

896e7882-18fe-11ef-ab88-22222d34d411:1

This is in the format of <server_uuid>:<txid>. The UUID of the server is in the server_uuid global variable and the txid is the transaction id, which is an increasing number.

And for a range of transactions it looks like this:

896e7882-18fe-11ef-ab88-22222d34d411:1-12

Here there is a range of transactions, starting at 1 and ending at 12.

There can also be multiple ranges:

896e7882-18fe-11ef-ab88-22222d34d411:1-12,14-20

Here transaction 13 is missing in the range from 1 to 20.

And then there can be transactions from multiple servers:

896e7882-18fe-11ef-ab88-22222d34d411:1-12,14-20,fb0b950b-acb3-11ef-8f70-3674a2d2348e:1-10

GTID tags

If you run SET gtid_next='AUTOMATIC:abc then the next transaction will have a GTID that looks like this:

896e7882-18fe-11ef-ab88-22222d34d411:abc:1

There can be multiple tags in a GTID set and there can be multiple tags, for example:

896e7882-18fe-11ef-ab88-22222d34d411:1-3:abc:1:foo:1-3,5-7

This value is a compact way of representing this:

896e7882-18fe-11ef-ab88-22222d34d411:1
896e7882-18fe-11ef-ab88-22222d34d411:2
896e7882-18fe-11ef-ab88-22222d34d411:3
896e7882-18fe-11ef-ab88-22222d34d411:abc:1
896e7882-18fe-11ef-ab88-22222d34d411:foo:1
896e7882-18fe-11ef-ab88-22222d34d411:foo:2
896e7882-18fe-11ef-ab88-22222d34d411:foo:3
896e7882-18fe-11ef-ab88-22222d34d411:foo:5
896e7882-18fe-11ef-ab88-22222d34d411:foo:6
896e7882-18fe-11ef-ab88-22222d34d411:foo:7

So what are reason why you want to use GTID tags? One senario I can think of is heartbeats. So if you create a heartbeat table with one row and update that every second then you can use this to test replication delay and write availability.

CREATE TABLE heartbeat (
    id INT PRIMARY KEY,
    t TIMESTAMP(6)
);

-- Monitoring service updates heartbeat, if writes fail: alert, use for write availability metric
SET gtid_next='AUTOMATIC:heartbeat';
REPLACE INTO heartbeat VALUES (1, CURRENT_TIMESTAMP(6));

-- On replicas, check the difference between the heartbeat and the current timestamp to measure replicaton delay
SELECT TIMESTAMPDIFF(MICROSECOND, t, CURRENT_TIMESTAMP(6)) FROM heartbeat;

There are tools like pt-heartbeat from Percona Toolkit that can help you with this.

Why having a heartbeat tag can help here is that losing the data of a heartbeat transaction is fine, while losing a transaction from an application isn’t.

Other senarios are data imports and data archival.

Binlog events

Binlogs (short for binary logs) are logs of changes made to the database. These can be used for replication between a primary database server and replicas and be used together with backups to do point-in-time recovery. Another usecase is sending events to other systems via a Change Data Capture (CDC) tool.

Adding tags impacts a few of the types of events that are sent in the binlog stream.

`PREVIOUS_GTIDS_LOG_EVENT`

This event is used to send which GTIDs were in previous binlog files after switching to a new file.

With mysqlbinlog ... --read-from-remote-server --hexdump -vv $binlogfile you can see what these events look like. Here is an example:

# at 127
#241126 17:24:31 server id 1  end_log_pos 158 CRC32 0x1b54f2b2
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000007f 3f f6 45 67   23   01 00 00 00   1f 00 00 00   9e 00 00 00   80 00
# 00000092 00 00 00 00 00 00 00 00  b2 f2 54 1b             |..........T.|
#   Previous-GTIDs
# [empty]

The first line is a header with a timestamp, type, server_id, size, position and flags. The second line ends with the CRC32 checksum. The actual body for this event is 00 00 00 00 00 00 00 00, which means that there were no previous GTIDs. If there are GTIDs it would change into 01 00 00 00 00 00 00 00, etc. and then for each GTID set it has the UUID and then one or more ranges of transactions.

When MySQL sees the first tagged GTID it changes this event and keeps using a new format until the server is restarted.

Then the events look like this:

# at 127
#241126 17:57:35 server id 1  end_log_pos 158 CRC32 0x3b12f980
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000007f ff fd 45 67   23   01 00 00 00   1f 00 00 00   9e 00 00 00   80 00
# 00000092 01 00 00 00 00 00 00 01  80 f9 12 3b             |............|
#   Previous-GTIDs
# [empty]

Here the body is 01 00 00 00 00 00 00 01. here the last 01 means the GTID format where 00 is classic and 01 means tagged. For tagged GTIDs the number of GTIDs is encoded slightly differently. First a mask of 0xFF000000000000FF is used and then the resulting value is shifted by 1 byte. And then the GTIDs are encoded with a field for the tag. If there are multiple tags, then these are send as separate GTID sets in this event.

Here is an example:

# at 127
#241129  9:39:26 server id 1  end_log_pos 291 CRC32 0x248c7bd2
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000007f ae 6f 49 67   23   01 00 00 00   a4 00 00 00   23 01 00 00   80 00
# 00000092 01 03 00 00 00 00 00 01  89 6e 78 82 18 fe 11 ef |.........nx.....|
# 000000a2 ab 88 22 22 2d 34 d4 11  00 01 00 00 00 00 00 00 |.....4..........|
# 000000b2 00 01 00 00 00 00 00 00  00 04 00 00 00 00 00 00 |................|
# 000000c2 00 89 6e 78 82 18 fe 11  ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000d2 11 08 64 65 6d 6f 01 00  00 00 00 00 00 00 01 00 |..demo..........|
# 000000e2 00 00 00 00 00 00 04 00  00 00 00 00 00 00 89 6e |...............n|
# 000000f2 78 82 18 fe 11 ef ab 88  22 22 2d 34 d4 11 0c 66 |x..........4...f|
# 00000102 6f 6f 62 61 72 01 00 00  00 00 00 00 00 01 00 00 |oobar...........|
# 00000112 00 00 00 00 00 03 00 00  00 00 00 00 00 d2 7b 8c |................|
# 00000122 24                                               |.|
#   Previous-GTIDs
# 896e7882-18fe-11ef-ab88-22222d34d411:1-3:demo:1-3:foobar:1-2

So 01 03 00 00 00 00 00 01 means it is a tagged GTID format (ends with 01). And that there are 3 GTIDs.

The first one:

# 00000092                          89 6e 78 82 18 fe 11 ef |.........nx.....|
# 000000a2 ab 88 22 22 2d 34 d4 11  00 01 00 00 00 00 00 00 |.....4..........|
# 000000b2 00 01 00 00 00 00 00 00  00 04 00 00 00 00 00 00 |................|
# 000000c2 00                                               |..nx..........4.|

Description	Value	Decoded
UUID	`89 6e 78 82 18 fe 11 ef ab 88 22 22 2d 34 d4 11`	`896e7882-18fe-11ef-ab88-22222d34d411`
Tag	`00 01 00 00 00 00 00 00`	empty
Start	`01 00 00 00 00 00 00`	1
End	`00 04 00 00 00 00 00 00`	4

The second one:

# 000000c2    89 6e 78 82 18 fe 11  ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000d2 11 08 64 65 6d 6f 01 00  00 00 00 00 00 00 01 00 |..demo..........|
# 000000e2 00 00 00 00 00 00 04 00  00 00 00 00 00 00       |...............n|

`GTID_TAGGED_LOG_EVENT`

There is a new event called GTID_TAGGED_LOG_EVENT (type 2a). I expected this to be basically the same as the GTID_LOG_EVENT (type 21), but with tags.

This is what a GTID_LOG_EVENT looks like:

# at 158
#241202 20:02:25 server id 1  end_log_pos 235 CRC32 0xd3cf038f
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e 41 04 4e 67   21   01 00 00 00   4d 00 00 00   eb 00 00 00   00 00
# 000000b1 01 89 6e 78 82 18 fe 11  ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000c1 11 01 00 00 00 00 00 00  00 02 00 00 00 00 00 00 |................|
# 000000d1 00 00 01 00 00 00 00 00  00 00 fb 20 6c 30 4e 28 |............l0N.|
# 000000e1 06 ca f4 5f 01 00 8f 03  cf d3                   |..........|
#   GTID    last_committed=0    sequence_number=1   rbr_only=no original_committed_timestamp=1733166145216763   immediate_commit_timestamp=1733166145216763 transaction_length=202
# original_commit_timestamp=1733166145216763 (2024-12-02 20:02:25.216763 CET)
# immediate_commit_timestamp=1733166145216763 (2024-12-02 20:02:25.216763 CET)
/*!80001 SET @@session.original_commit_timestamp=1733166145216763*//*!*/;
/*!80014 SET @@session.original_server_version=90100*//*!*/;
/*!80014 SET @@session.immediate_server_version=90100*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:1'/*!*/;

This is what a GTID_TAGGED_LOG_EVENT looks like:

# at 158
#241202 19:53:13 server id 1  end_log_pos 240 CRC32 0x71f11620
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e 19 02 4e 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00
# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 72 08  00 0a 04 0c 7f e2 74 90 |.foobar.......t.|
# 000000e1 0f 4e 28 06 10 3d 03 12  a3 ff 0a 20 16 f1 71    |.N............q|
#   GTID    last_committed=0    sequence_number=1   rbr_only=no original_committed_timestamp=1733165593949410   immediate_commit_timestamp=1733165593949410 transaction_length=207
# original_commit_timestamp=1733165593949410 (2024-12-02 19:53:13.949410 CET)
# immediate_commit_timestamp=1733165593949410 (2024-12-02 19:53:13.949410 CET)
/*!80001 SET @@session.original_commit_timestamp=1733165593949410*//*!*/;
/*!80014 SET @@session.original_server_version=90100*//*!*/;
/*!80014 SET @@session.immediate_server_version=90100*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:foobar:1'/*!*/;

The output looks almost identical, but if you look at the hexdump you can see that there are quite some differences, for example you don’t see the 89 6e 78 82 that is the start of the UUID of the server. It looks like a new interal library (mysql::serialization) is used for serialization. Someone wrote took the time to write some documentation for this new library, which is great. However with this documentation I haven’t been able to decode this event yet.

Note that replication protocol is documented here.

Impact

I have been doing some work on the MySQL protocol dissector in Wireshark. As wireshark only decodes the binlog into events and doesn’t decode the events themselve it was enough to just add the event name and code for the GTID_TAGGED_LOG_EVENT.

For go-mysql, which is a Go library that amongst other things parses the binlog stream, the work is still ongoing. The code needs to handle the new format of the PREVIOUS_GTIDS_LOG_EVENT, the new GTID_TAGGED_LOG_EVENT event and also be able to parse the text format of the tagged GTIDs.

There are many other tools that might use the binlog stream and face the same issues.

Also, using tagged GTIDs is a sure way to break replication between 8.4 (or 8.3?) and earlier versions of MySQL.

These are the bugs that are related to tagged GTID that I have filed:

Monday, November 11, 2024

Release and version management is hard

In this post I will talk about release management with MySQL as example. Release management has to do with what to put in a release, how often to do a release, how to long to support a release and how to number or name the releases.

The early years

Today software is almost exclusively delivered over the internet, but this didn’t use to be the case. Software used to be delivered in a box that you had to buy at a physical store. These were often shrink wrapped and forced you to agree with the EULA by opening the box. The box then contained floppies, CD’s, DVD’s or any other suitable medium.

There were multiple reasons why physical media were used, even when the internet arrived. One of the main limitations was bandwidth, this is also why FreeBSD, Linux, etc were sold on physical media. And one of the other reasons was that online payment wasn’t as ubiquitous as it is today. Then there were also ways of copy protection that relied on physical media and/or dongles. For operating systems and such there was also a chicken-and-the-egg situation that you needed an OS to be able to download anything. And a nice box with a printed manual probably felt more valuable than a downloaded file.

This all put some constraints on release management. The time between releases was quite large as it would need new physical media to be produced, with a newly designed box, manual, etc. And as many software versions were sold per release it needed to have enough new features to convince people to pay for an upgrade. There were sometimes discounts for upgrades, but then the software needed some way to only work as upgrade or have a way to validate that the user was in possession of the previous version. Bugfixes and patches might be delivered online or not at all.

As people were either using dial-up and only connected to the internet for a short amount of time or not at all, this made security a lot less important than it is today. It also meant that people used manuals a lot more as web search might not have been available.

Release management today

Today with new releases being made available over the internet this has changed the constraints it puts on release management a lot.

We now have package managers that might download and upgrade with minimal or no user intervention. And we now have Puppet, Kubernetes, etc to manage software on a set of machines. As package managers take care of dependencies this makes it easier for software packages to have more and more complex dependencies. And with most systems being connected to the internet all the time, security and security updates become more important.

How this applies to MySQL

MySQL has had a interesting history when it comes to releases and versioning.

Source: https://github.com/dveeden/mysql-history-graph

Early versions

MySQL 1.0 was never released to a wide audience. Then version 2 was skipped, probably because calling something version 3 was better for marketing. The first widely used version was probably MySQL 3.23.x. Note that MySQL isn’t following semantic versioning where the version has a major, a minor and a patch version. For MySQL the first two numbers (e.g. 5.6) make the major version. After 3.23 followed 4.0 and 4.1

The 5.x versions.

Version 5.0 of MySQL was filled with a large number of both features and bugs. I think this was driven by the partnership with SAP. This is a good example of what you get when you prioritize features over stability.

Version 5.1 brought partitioning.

And then MySQL 5,2, 5.3 and 5.4 never made it to production. Only 5.4 was released as development preview release.

With MySQL 5.5 the build system switched to a unified cmake based build system. Besides a big change for developers this also impacted Linux distros as they had to change how they build MySQL packages. This is the first version that had InnoDB as the default storage engine. Eerlier versions also supported InnoDB, but not as default, probably in part because at that time the InnoDB storage engine was in the hands of InnoBase and/or Oracle.

The 5.6 version brought GTID, which was a major step forward for replication.

Then 5.7 brought native JSON support and generated columns, among many other features.

What happend to 6.x and 7.x

MySQL 6.0 work started shortly after 5.1 was released, but never made it into a production release. It had many interesting features, most of which have been backported to later 5.x versions of MySQL. For what I understand this was due to the 6.0 codebase being unreliable due to a mix of badly tested and/or incomplete features.

Then the 7.x versions had been used for MySQL Cluster, which is basically a modified MySQL version that included the NDB storage engine. In 8.0 the changes for the server were merged with the regular MySQL code allowing a unified version.

The 8.x timeframe

MySQL 8.0 brought many new features but also had a big change in the foundation of MySQL. MySQL 8.0 used InnoDB to store the data dictionary instead of FRM files. This allows for better crash safety and atomic DDL and opens up future possibilities like transactional DDL.

As MySQL made its way to Oracle via Sun it now has to adhere to the release policies of Oracle. This means that each release has to be supported for a long time. So sticking with 8.0, and adding new features in updates of 8.0 was what they did. This worked well, but it had one big drawback: MySQL could deprecate features, API’s etc, but wasn’t able to remove them until a new major version.

Then MySQL 8.1, 8.2 and 8.3 were released as Innovation releases which ten lead to a 8.4 long term support release. In these releases MySQL cleaned up many of the previously deprecated features. I think cleaning up these things is good as it makes it easier to develop new features that don’t have to take these deprecated things into account and it makes for cleaner code. But it also means that third party software might have to test and update their code to avoid these now removed features. And a new major version also means a new training and certification program and all the other things related to a new major version.

The 9.x timeframe

MySQL has released 9.0 and 9.1 as innovation releases. However MySQL is more than just the server and for some components like MySQL Connector/J (The database driver for Java) the 9.1 release isn’t an innovation release but a regular GA release. And the MySQL Workbench GUI tool has been stuck at 8.0.x. So where 8.0 was a coordinated release for all components with more or less the same version, this is no longer true.

Learnings

Don’t sacrifice stability for features

I think MySQL 5.0 and MySQL 6.0 have been examples of this. Adding features first and then later making the codebase stable again doesn’t seem to be a good strategy. I think taking features from 6.0 and then adding them in 5.x releases was a smart decision.

Don’t stay on the same major version for too long

It seems that many people only upgraded to 8.0 recently and found it quite challenging to do. I think this is mostly because the large set of changes between two major versions. More often releases with slightly less big and/or breaking changes makes it easier to upgrade.

Cleanup is needed

I think it is critical for a software project to be able to deprecate and cleanup features and APIs. I do like how Go keeps compatibility between 1.x version but I don’t think this would work for MySQL.

Innovation and LTS releases

I think the innovation releases that are released often, but don’t get the long term support are helpful and make it easier for people to test and try features before they land in a long term support release.

Then a small number of LTS releases make it easier for developers to find a stable base for their application. As many bugfixes have to be backported this reduces the load on the MySQL developers.

Continues release

Some software projects now use a continues release model where there are no real versions, but just an ever moving set of updates. This is used for Debian unstable/sid. CentOS is using something that is somewhat similar with CentOS Stream, which still has versions, but not the minor versions that CentOS Linux used to have.

I don’t think this would make sense for MySQL, however a nightly build like what TiDB offers whould be nice, but this doesn’t seem to align with the procedures from Oracle.

Disclaimers

I don’t work for Oracle/MySQL so this is purely an outsiders view on things.

My main point is that managing release version numbers and contents is difficult. I used MySQL as an example as it has a rich history, but I could have used something else instead.

I work for PingCAP, but the content of this blog is only mine and not that of my employer.

Monday, May 6, 2024

MySQL Protocol: Collations

This story starts with a pull request for go-mysql to allow setting the collation in auth handshake that I was reviewing. The reason why the author wanted to do this is to speedup the connection setup as he has a latency sensitive application and a lot of connection setups and tear downs.

While looking at this I noticed that the collation would be stored in a single byte. However the list of supported collations shows collations with an ID that’s more than 255.

mysql> SELECT MIN(ID),MAX(ID) FROM information_schema.collations;
+---------+---------+
| MIN(ID) | MAX(ID) |
+---------+---------+
|       1 |     323 |
+---------+---------+
1 row in set (0.00 sec)

The protocol documentation for Protocol::HandshakeResponse41 says that the value sent here is only the lower 8-bits.

So I was wondering how do other connectors send this to the server? Are the other 8-bits sent elsewhere in the protocol?

So I used MySQL Connector/Python to try this out.

import mysql.connector
c = mysql.connector.connect(
    host='127.0.0.1',
    port=3306,
    user="test",
    password="test",
    collation="utf8mb4_ja_0900_as_cs",
    ssl_disabled=True
)
cur = c.cursor()
cur.execute("SHOW SESSION VARIABLES LIKE '%collation%'");
for col in cur:
    print(col)
cur.close()
c.close()

Here utf8mb4_ja_0900_as_cs is used which has collation ID 303 or 0x12F in hex. Another way to write this down is 0x01 0x2F

This is how this looked in Wireshark:

So the collation is set to latin1_bin (47) instead of utf8mb4_ja_0900_as_cs (303). Note that 47 in binary is 0x2F. And then we can spot the other part of this as 0x01 in the Unused field, for which the docs say it should be a filler of all zeros.

So I concluded that the protocol docs on this are probably outdated and I started a few merge/pull requests for this:

First this MR for Wireshark to have it decode the 2 bytes that we have seen as a single collation number. And then this PR for mysql-server to update the docs. I also created this PR for MySQL Connector/Python to update the comments in the code.

After doing this I also added a comment to the MariaDB docs for the protocol. And here Vladislav Vaintroub quickly responded with a conclusion which was different than my conclusion. He checked the code to see what the server (both MySQL and MariaDB) is reading and there it is only reading one byte.

After that I have created another MR for Wireshark to revert some of my changes. I also added some comments to my previous PRs/bugreports.

I also did some more testing with Connector/Python and found out that depending on whether or not the C Extension is used and also depending on the version that it might set the collation during the handshake but also after the handshake with SET NAMES. And not just once, it sometimes does it twice. And the second time it only specifies the character set and not the collation, causing the collation to reset to the default one for that character set. I’ve filed this bug for that.

The current state is that the collation can only be set during the handshake if it is <= 255. For other cases you need to send a SET NAMES statement. This isn’t great as this adds at least one roundtrip to the connection setup.

As all character sets have a default collation that’s <=255 this is only an issue with some of the non-default collations. It might also become an issue collations that might be added in the future.

mysql> SELECT MAX(id) FROM information_schema.collations WHERE IS_DEFAULT='YES';
+---------+
| MAX(id) |
+---------+
|     255 |
+---------+
1 row in set (0.01 sec)

Another indication on why this only affects few users is the MySQL C API.

This is how one can set the character set:

mysql_options(&mysql, MYSQL_SET_CHARSET_NAME, "latin1");
mysql_real_connect(&mysql, ...);

Note that MySQL doesn’t send this latin1 string to the server, it looks up the default collation and sends the ID of that to the server instead. So anything that uses the C API isn’t affected by this as it has to use SET NAMES to set non-standard collations.

So I assume that the field in the handshake packet either wasn’t considered once collations with ID’s of more than 255 were added or it was a way to send the character set in a more efficient way.

Note that character sets’s have only names and no numeric ID.

As MySQL Connector/Python takes a collation name, but sends a collation ID it needs to keep a list so it can lookup the ID before connecting to the server. So to connect with a new collation you might have to update the MySQL Connector/Python version.

Monday, November 6, 2023

Notes on Compression in the MySQL Protocol

The MySQL Protocol is the network protocol that is used between a MySQL server and a client. This is called the “classic” protocol as there is now a newer protobuf based protocol called X Protocol. However the “classic” protocol is used by many database drivers, applications, etc. and also by MySQL Replication.

The MySQL Protocol has the option to compress network traffic. Most client libraries don’t enable this by default as this increases CPU usage.

Some history

Initially this was added in MySQL 3.22 (based on historical release notes) and was based on zlib.

Then in MySQL 8.0.18 a second compression algorithm, Zstandard, was added based on a contribution by Facebook.

So zlib has been there basically forever and Zstandard support is new-ish (8.0.18 was released in October 2019).

Why you might want to use compression.

There are multiple usecases that are often mentioned when talking about the compressed protocol. The first one is slow connections. This could, for example, be a DBA doing remote work.

The second usecase is metered connections. This could be a DBA using a mobile connection or to reduce network transfer cost in a Cloud environment.

Basic usage

The usage is as simple as mysql --compression-algorithms=zstd .... Instead of zstd you can also use zlib or even provide both (zstd,zlib) and see what’s available on the server. Most other clients like mysqldump, MySQL Shell, etc also support this setting. Older versions use mysql --compress ... which enables zlib based compression. What is used in the end depends on what the server supports.

For zstd there is a --zstd-compression-level option that allows you to set a level between 1 and 22, where the default is 3. This allows you to better balance between faster and better compression.

Once connected you could check the Compression, Compression_algorithm and Compression_level status variables to verify what is being used.

sql> SHOW STATUS LIKE 'Compression%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Compression           | ON    |
| Compression_algorithm | zstd  |
| Compression_level     | 3     |
+-----------------------+-------+
3 rows in set (0.0043 sec)

In MySQL Shell the \status command also provides you with this information:

sql> \status
MySQL Shell version 8.2.0

Connection Id:                30
Current schema:               test
Current user:                 root@localhost
SSL:                          Cipher in use: TLS_AES_128_GCM_SHA256 TLSv1.3
Using delimiter:              ;
Server version:               8.2.0 MySQL Community Server - GPL
Protocol version:             Compressed 10                               <------------------
Client library:               8.2.0
Connection:                   127.0.0.1 via TCP/IP
TCP port:                     3306
Server characterset:          utf8mb4
Schema characterset:          utf8mb4
Client characterset:          utf8mb4
Conn. characterset:           utf8mb4
Result characterset:          utf8mb4
Compression:                  Enabled (zstd)                              <------------------
Uptime:                       1 day 3 hours 20 min 4.0000 sec

Threads: 2  Questions: 200  Slow queries: 0  Opens: 348  Flush tables: 3  Open tables: 266  Queries per second avg: 0.002

The connectors for your favorite programming language probably have settings for this as well. Some might only allow you to enable compression with zlib and not yet offer you the option for zstd.

Language	Driver	zlib	zstd	Setting
C	MySQL C API (libmysqlclient)	✅	✅	`MYSQL_OPT_COMPRESSION_ALGORITHMS` and `MYSQL_OPT_COMPRESS`
Python	Connector/Python	✅	❌	`compress=True`
Java	Connector/J	✅	❌, Bug #112732	`useCompression=true`
Go	go-sql-driver/mysql	❌ #24	❌	-
PHP	mysqli	✅	❌	`MYSQLI_CLIENT_COMPRESS`
Perl	DBD::mysql	✅	✅	`mysql_compression=zlib`

Note that many drivers are based on libmysqlclient and only require minimal changes to support this.

A look at the internals

Handshake

When establishing a connection both the client and server send a set of capability flags. There are two flags that are used for compression: First there is CLIENT_COMPRESS that is used to advertise support for zlib and then there is CLIENT_ZSTD_COMPRESSION_ALGORITHM that is used to advertise support for Zstandard. If both the client and the server have a flag set then that feature is used.

A bit of a special case is possible when both the client and server set both of the compression flags, in that case zlib is used. You can try this out by running --compression-algorithms=zstd,zlib and looking at the status variables.

If the client has the CLIENT_ZSTD_COMPRESSION_ALGORITHM flag set then the Protocol::HandshakeResponse41 also contains a compression level for Zstandard.

Packet compression

The MySQL Protocol consists of “MySQL Packets” that are send between the client and server. When compression is used this is added as a separate layer in the network stack.

So the stack basically looks something like this:

|          MySQL Protocol           |
|      MySQL Compressed Protocol    |
|          TLS (Optional)           |
| TCP/IP | Unix Socket |   .....    |

And yes, if you want you can use the compressed protocol over a Unix Socket. But I can’t think of a good reason to do so.

The ..... above is because of other connection options on Windows, like Shared memory and Named Pipes.

As this is a separate layer there isn’t a 1-to-1 mapping between compressed packets and regular packets.

So what do these MySQL Compressed Protocol packets look like?

Let’s start by looking at the header.

Description:          
| <compressed_length[3]> | <compressed_sequence[1]> | <uncompressed_length[3] |

Example values (hex):
| 15 00 00               | 00                       | 000000                  |

So here we have: - Compressed Length: 21. This is 15 00 00 in hex. This is a 24 bit (3 byte) integer. - Compressed Sequence: 0 - Uncompressed Length: 0

If you know the regular MySQL Protocol this probably looks very similar, including the use of the somewhat excentric 24 bit integer.

That the Uncompressed Length is set to zero means that for this packet the payload isn’t actually compressed. This happens when the side that does compression doesn’t think it makes sense to compress the payload. MySQL doesn’t compress the payload if it is less than MIN_COMPRESS_LENGTH, which is 50 bytes.

After the header we get the payload which depending on the capability flags might be compressed with zlib or Zstandard or might be uncompressed if the Uncompressed Length is set to 0.

This is how this looks in Wireshark:

A compressed protocol packet can also contain multiple (regular) MySQL packets:

Here there are eight regular MySQL Packets inside the compressed packet.

Wireshark can uncompress the compressed payload (both zlib and zstd). This is why it is able to show you what’s in the compressed packet. On the bottom you can see “Fram (202 bytes)” and “compressed data (147 bytes)”.

For this packet we have: - Compressed Length: 129 bytes - Uncompressed Length: 147 bytes - TCP Payload Length: 136 bytes (this matches our expectation: 129 bytes payload + 7 byte header)

So instead of sending 147 bytes we sent 136 bytes. We saved 11 bytes (~7.5%).

The actual compression ratio depends on the size (bigger is better), randomness of the content, the compression algorithm and compression level. For zlib the compression level is hardcoded, so you can only use different compression levels for Zstandard.

We talked about two situations: The first situation is one compressed packet with one MySQL packet and the second situation is one compressed packet with multiple MySQL packets. But there is a third situation: Multiple compressed packets for one MySQL packet.

A single MySQL Protocol Packet can be split over multi compressed packets.

Efficiency

To get good results with compression in general it really helps if you’re compressing larger sets of data and if the compressed data isn’t very random.

Let’s see what results we can get in a best case senario by using REPEAT() to send a string of 100 million times x.

To do this we need to make sure we set a large max_allowed_packet for both the server and client. We also disable SSL/TLS as that makes capturing things easier. In case you really need this, Wireshark is able to decrypt MySQL sessions with TLS.

$ mysql --compression-algorithms=zlib -h 127.0.0.1 -u root -pRoot123@ --ssl-mode=DISABLED --max-allowed-packet=1073741824
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 237
Server version: 8.2.0 MySQL Community Server - GPL

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> pager wc
PAGER set to 'wc'
mysql> select repeat('x',100*1024*1024);
      5       9 419431449
1 row in set (1.06 sec)

mysql> ^DBye

The traffic that we captured in Wireshark looks like this:

I’ve added the compressed and uncompressed length as columns as well as the source and destination port numbers and hidden some of the other columns to make this fit on the screenshot. I’ve also set the display filter to mysql to hide some other packets.

The packets 1 untill 14 are from the connection setup. Packet 16 has our SELECT-query. And packets 18 untill 46 have our resultset. The final packet (48) is the COM_QUIT triggered by our Ctrl-D to close the session.

Let’s focus on the resultset (packets 18-46).

These packets have compressed sequence numbers 1 until 14. The query sent by the client had compressed sequence number 0.

Packet Number	Compressed Sequence Number	Compressed Length	Uncompressed Length
18	1	106	16384
20	2	16302	16760891
22	3	41	16384
24	4	16302	16760835
26	5	41	16384
28	6	16302	16760835
30	7	41	16384
32	8	16302	16760835
34	9	41	16384
36	10	16302	16760835
38	11	41	16384
40	12	16302	16760835
42	13	44	16384
44	14	4071	4177939
46	15	11	0

As you can see there are multiple packets of 16384, which is because our net_buffer_length is set to the default (16384).

For the MySQL Packets inside of the compressed packets the length is set to FF FF FF to indicate that this isn’t the last packet. See the section about Sending more than 16Mb in the docs for more details on how this works.

So there are multiple MySQL packets of 16777216 (including 4 byte header) sent to send the total payload. This is the maximum size of a MySQL packet as the length is a 24 bit integer and 2²⁴ = 16777216.

But as a Compressed Packet also uses a 24 bit integer to store the compressed and uncompressed size it can only store a MySQL Packet of 16777216 - 7 (header) = 16777209 bytes. So a 16K MySQL packet needs multiple compressed packets. As MySQL started with a 16K (net_buffer_length) packet the remainder of the packet fits in the next compressed packet.

Combining the uncompressed lengths of all packets this gives us 104857693 and if we add the 11 bytes of the last packet that didn’t have it’s payload compressed we get 104857704 which is about 100 MiB.

If we combine the compressed payloads we get 102249 which is slightly less than 100 KiB. This probably doesn’t include the header for the 15 packets, but with 15 * 7 = 105 bytes that isn’t going to change the result.

So in this case compression really reduces the amount of bytes we need to transfer.

However this is a best case senario. In the real world you will probably have a mix of large resultsets and smaller resultsets and also have data that doesn’t have this many repeating values.

Performance

Percona did some testing in 2019 that shows that the compressed protocol helps with performance if your workload is limited by your network bandwidth. It also shows that it reduces performance in case you’re not limited by network bandwith.

Replication

As MySQL replication uses the “classic” protocol as well there is the option to enable compression. For this the settings are SOURCE_COMPRESSION_ALGORITHMS and SOURCE_ZSTD_COMPRESSION_LEVEL that are part of the CHANGE REPLICATION SOURCE statement. And there is also the replica_compressed_protocol setting that if enabled takes precedence over the SOURCE_COMPRESSION_ALGORITHMS setting.

However with replication you might want to use Binary Log Transaction Compression instead as that also reduces the size on disk and avoids compressing and uncompressing the same data multiple times.

Observability

As said before there are three session status variables: - Compression - Compression_algorithm - Compression_level

In performance_schema I only found a single instrument related to the compressed protocol:

sql> SELECT * FROM setup_instruments WHERE name like '%compress_packet%'\G
*************************** 1. row ***************************
         NAME: memory/sql/NET::compress_packet
      ENABLED: YES
        TIMED: NULL
   PROPERTIES: 
        FLAGS: 
   VOLATILITY: 0
DOCUMENTATION: Buffer used when compressing a packet.
1 row in set (0.0029 sec)

Note that the session status can be used to get server wide numbers like this:

sql> SELECT VARIABLE_VALUE,COUNT(*)
   -> FROM performance_schema.status_by_thread
   -> WHERE VARIABLE_NAME='Compression_algorithm'
   -> GROUP BY VARIABLE_VALUE;
+----------------+----------+
| VARIABLE_VALUE | COUNT(*) |
+----------------+----------+
| zlib           |        1 |
| zstd           |        2 |
|                |        1 |
+----------------+----------+
3 rows in set (0.0006 sec)

One of the obvious things that seems to lack instrumentation is the effectiveness of compression like the compression ratio or the amount of bytes saved by enabling compression. In addition to this it would also be good to get more info on how much CPU time is spend doing compression and uncompression.

There is this:

sql> show session status like 'Bytes%';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Bytes_received | 2381  |
| Bytes_sent     | 13455 |
+----------------+-------+
2 rows in set (0.0015 sec)

However the docs aren’t clear on if this is measuring the MySQL Protocol (before compresssion) or the network link (after compression).

For the X Protocol there is more information available, but that doesn’t help us with the “classic” protocol.

sql> SHOW SESSION STATUS LIKE 'Mysqlx_bytes_%';
+------------------------------------------+-------+
| Variable_name                            | Value |
+------------------------------------------+-------+
| Mysqlx_bytes_received                    | 0     |
| Mysqlx_bytes_received_compressed_payload | 0     |
| Mysqlx_bytes_received_uncompressed_frame | 0     |
| Mysqlx_bytes_sent                        | 0     |
| Mysqlx_bytes_sent_compressed_payload     | 0     |
| Mysqlx_bytes_sent_uncompressed_frame     | 0     |
+------------------------------------------+-------+
6 rows in set (0.0014 sec)

Having information on compressed vs uncompressed like this for the classic protocol would be nice. However this doesn’t seem to have per algorighm info.

And for the Binary Log Transaction Compression features there is this:

sql> DESCRIBE performance_schema.binary_log_transaction_compression_stats;
+--------------------------------------+------------------------+------+-----+---------+-------+
| Field                                | Type                   | Null | Key | Default | Extra |
+--------------------------------------+------------------------+------+-----+---------+-------+
| LOG_TYPE                             | enum('BINARY','RELAY') | NO   |     | NULL    |       |
| COMPRESSION_TYPE                     | varchar(64)            | NO   |     | NULL    |       |
| TRANSACTION_COUNTER                  | bigint unsigned        | NO   |     | NULL    |       |
| COMPRESSED_BYTES_COUNTER             | bigint unsigned        | NO   |     | NULL    |       |
| UNCOMPRESSED_BYTES_COUNTER           | bigint unsigned        | NO   |     | NULL    |       |
| COMPRESSION_PERCENTAGE               | smallint               | NO   |     | NULL    |       |
| FIRST_TRANSACTION_ID                 | text                   | YES  |     | NULL    |       |
| FIRST_TRANSACTION_COMPRESSED_BYTES   | bigint unsigned        | NO   |     | NULL    |       |
| FIRST_TRANSACTION_UNCOMPRESSED_BYTES | bigint unsigned        | NO   |     | NULL    |       |
| FIRST_TRANSACTION_TIMESTAMP          | timestamp(6)           | YES  |     | NULL    |       |
| LAST_TRANSACTION_ID                  | text                   | YES  |     | NULL    |       |
| LAST_TRANSACTION_COMPRESSED_BYTES    | bigint unsigned        | NO   |     | NULL    |       |
| LAST_TRANSACTION_UNCOMPRESSED_BYTES  | bigint unsigned        | NO   |     | NULL    |       |
| LAST_TRANSACTION_TIMESTAMP           | timestamp(6)           | YES  |     | NULL    |       |
+--------------------------------------+------------------------+------+-----+---------+-------+
14 rows in set (0.0017 sec)

This has per algorithm stats as COMPRESSION_TYPE has ZSTD for example. There are individual rows in this table per LOG_TYPE and COMPRESSION_TYPE combination. This table even has a very user friendly COMPRESSION_PERCENTAGE field that can be used to see how well the data is compressing.

With a SytemTap script like the one below you can get some of the same issue as what Wireshark can show you. Maybe this can be extended to get more useful details. For now the only benefit of this is that it works with TLS connections without any addtional configuration.

mysql_compress.stp:

probe process("/usr/bin/mysql").function("my_uncompress*").call {
    cl = @cast($complen, "size_t")
    printf("\nReceiving compressed packet with len=%u, complen=%u\n", $len, cl)
}

Example usage:

$ sudo stap mysql_compress.stp  -c 'mysql --compression-algorithms=zstd -h 127.0.0.1 -u root -p'
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 301
Server version: 8.2.0 MySQL Community Server - GPL

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


Receiving compressed packet with len=92, complen=0
mysql> 
Receiving compressed packet with len=143, complen=161

mysql> pager wc
PAGER set to 'wc'
mysql> select repeat('x',10);

Receiving compressed packet with len=71, complen=0
      5       9      95
1 row in set (0.00 sec)

mysql> select repeat('x',100);

Receiving compressed packet with len=80, complen=162
      5       9     525
1 row in set (0.00 sec)

mysql> select repeat('x',1000);

Receiving compressed packet with len=81, complen=1065
      5       9    5025
1 row in set (0.00 sec)

mysql> select repeat('x',10000);

Receiving compressed packet with len=86, complen=10066
      5       9   41049
1 row in set (0.00 sec)

mysql> ^DBye
[dvaneeden@dve-carbon ~]$ cat mysql_compress.stp
probe process("/usr/bin/mysql").function("my_uncompress*").call {
    cl = @cast($complen, "size_t")
    printf("\nReceiving compressed packet with len=%u, complen=%u\n", $len, cl)
}

If you create trace files you get some info:

...
T@18: | | | | | | | | note: Packet got longer on compression; Not compressed
...
T@18: | | | | | | | | note: Packet too short: Not compressed
...
T@18: | | | | | | | | note: Packet got longer on zstd compression; Not compressed

This will tell you when the payload was less than 50 bytes or when either zlib or zstd compressed data was longer than the original data.

Possible improvements

The MIN_COMPRESS_LENGTH is now hardcoded. It might make sense to make this configurable and set it to say 1 KiB or 1 MiB and only compress the larger payloads. This might give a better balance between performance and compression.

Docs

Work done by me

Here are some of the compressed protocol related things that I have worked on. As is usualy the case with open source projects this involved a lot of help from maintainers and other contributors on these projects. Many thanks to everyone that was involved.

Wireshark

Decoding of compressed packets in packet_myql.c