Monday, December 2, 2024

MySQL GTID tags and binlog events

MySQL 8.4 and newer have extended the Global Transaction ID (GTID) functionality with a new “tag” option.

Refresher on GTID

A GTID is a unique ID that is assigned to a transaction. This is used if gtid_mode is set to ON. The benefit of this is that a transaction can be uniquely identified in a MySQL replication setup with multiple levels. Among others this makes it easier to refactor a replication tree as a MySQL replica knows which transactions it has seen and can use this to find the right position to start replicating from a new source.

The format of GTIDs is documented here.

Before GTID was used replication worked based on a file and offset (e.g. file=binlog.000001,offset=4), which is unique to every server.

A GTID without tag looks like this:

896e7882-18fe-11ef-ab88-22222d34d411:1

This is in the format of <server_uuid>:<txid>. The UUID of the server is in the server_uuid global variable and the txid is the transaction id, which is an increasing number.

And for a range of transactions it looks like this:

896e7882-18fe-11ef-ab88-22222d34d411:1-12

Here there is a range of transactions, starting at 1 and ending at 12.

There can also be multiple ranges:

896e7882-18fe-11ef-ab88-22222d34d411:1-12,14-20

Here transaction 13 is missing in the range from 1 to 20.

And then there can be transactions from multiple servers:

896e7882-18fe-11ef-ab88-22222d34d411:1-12,14-20,fb0b950b-acb3-11ef-8f70-3674a2d2348e:1-10

GTID tags

If you run SET gtid_next='AUTOMATIC:abc then the next transaction will have a GTID that looks like this:

896e7882-18fe-11ef-ab88-22222d34d411:abc:1

There can be multiple tags in a GTID set and there can be multiple tags, for example:

896e7882-18fe-11ef-ab88-22222d34d411:1-3:abc:1:foo:1-3,5-7

This value is a compact way of representing this:

896e7882-18fe-11ef-ab88-22222d34d411:1
896e7882-18fe-11ef-ab88-22222d34d411:2
896e7882-18fe-11ef-ab88-22222d34d411:3
896e7882-18fe-11ef-ab88-22222d34d411:abc:1
896e7882-18fe-11ef-ab88-22222d34d411:foo:1
896e7882-18fe-11ef-ab88-22222d34d411:foo:2
896e7882-18fe-11ef-ab88-22222d34d411:foo:3
896e7882-18fe-11ef-ab88-22222d34d411:foo:5
896e7882-18fe-11ef-ab88-22222d34d411:foo:6
896e7882-18fe-11ef-ab88-22222d34d411:foo:7

So what are reason why you want to use GTID tags? One senario I can think of is heartbeats. So if you create a heartbeat table with one row and update that every second then you can use this to test replication delay and write availability.

CREATE TABLE heartbeat (
    id INT PRIMARY KEY,
    t TIMESTAMP(6)
);

-- Monitoring service updates heartbeat, if writes fail: alert, use for write availability metric
SET gtid_next='AUTOMATIC:heartbeat';
REPLACE INTO heartbeat VALUES (1, CURRENT_TIMESTAMP(6));

-- On replicas, check the difference between the heartbeat and the current timestamp to measure replicaton delay
SELECT TIMESTAMPDIFF(MICROSECOND, t, CURRENT_TIMESTAMP(6)) FROM heartbeat;

There are tools like pt-heartbeat from Percona Toolkit that can help you with this.

Why having a heartbeat tag can help here is that losing the data of a heartbeat transaction is fine, while losing a transaction from an application isn’t.

Other senarios are data imports and data archival.

Binlog events

Binlogs (short for binary logs) are logs of changes made to the database. These can be used for replication between a primary database server and replicas and be used together with backups to do point-in-time recovery. Another usecase is sending events to other systems via a Change Data Capture (CDC) tool.

Adding tags impacts a few of the types of events that are sent in the binlog stream.

PREVIOUS_GTIDS_LOG_EVENT

This event is used to send which GTIDs were in previous binlog files after switching to a new file.

With mysqlbinlog ... --read-from-remote-server --hexdump -vv $binlogfile you can see what these events look like. Here is an example:

# at 127
#241126 17:24:31 server id 1  end_log_pos 158 CRC32 0x1b54f2b2
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000007f 3f f6 45 67   23   01 00 00 00   1f 00 00 00   9e 00 00 00   80 00
# 00000092 00 00 00 00 00 00 00 00  b2 f2 54 1b             |..........T.|
#   Previous-GTIDs
# [empty]

The first line is a header with a timestamp, type, server_id, size, position and flags. The second line ends with the CRC32 checksum. The actual body for this event is 00 00 00 00 00 00 00 00, which means that there were no previous GTIDs. If there are GTIDs it would change into 01 00 00 00 00 00 00 00, etc. and then for each GTID set it has the UUID and then one or more ranges of transactions.

When MySQL sees the first tagged GTID it changes this event and keeps using a new format until the server is restarted.

Then the events look like this:

# at 127
#241126 17:57:35 server id 1  end_log_pos 158 CRC32 0x3b12f980
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000007f ff fd 45 67   23   01 00 00 00   1f 00 00 00   9e 00 00 00   80 00
# 00000092 01 00 00 00 00 00 00 01  80 f9 12 3b             |............|
#   Previous-GTIDs
# [empty]

Here the body is 01 00 00 00 00 00 00 01. here the last 01 means the GTID format where 00 is classic and 01 means tagged. For tagged GTIDs the number of GTIDs is encoded slightly differently. First a mask of 0xFF000000000000FF is used and then the resulting value is shifted by 1 byte. And then the GTIDs are encoded with a field for the tag. If there are multiple tags, then these are send as separate GTID sets in this event.

Here is an example:

# at 127
#241129  9:39:26 server id 1  end_log_pos 291 CRC32 0x248c7bd2
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000007f ae 6f 49 67   23   01 00 00 00   a4 00 00 00   23 01 00 00   80 00
# 00000092 01 03 00 00 00 00 00 01  89 6e 78 82 18 fe 11 ef |.........nx.....|
# 000000a2 ab 88 22 22 2d 34 d4 11  00 01 00 00 00 00 00 00 |.....4..........|
# 000000b2 00 01 00 00 00 00 00 00  00 04 00 00 00 00 00 00 |................|
# 000000c2 00 89 6e 78 82 18 fe 11  ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000d2 11 08 64 65 6d 6f 01 00  00 00 00 00 00 00 01 00 |..demo..........|
# 000000e2 00 00 00 00 00 00 04 00  00 00 00 00 00 00 89 6e |...............n|
# 000000f2 78 82 18 fe 11 ef ab 88  22 22 2d 34 d4 11 0c 66 |x..........4...f|
# 00000102 6f 6f 62 61 72 01 00 00  00 00 00 00 00 01 00 00 |oobar...........|
# 00000112 00 00 00 00 00 03 00 00  00 00 00 00 00 d2 7b 8c |................|
# 00000122 24                                               |.|
#   Previous-GTIDs
# 896e7882-18fe-11ef-ab88-22222d34d411:1-3:demo:1-3:foobar:1-2

So 01 03 00 00 00 00 00 01 means it is a tagged GTID format (ends with 01). And that there are 3 GTIDs.

The first one:

# 00000092                          89 6e 78 82 18 fe 11 ef |.........nx.....|
# 000000a2 ab 88 22 22 2d 34 d4 11  00 01 00 00 00 00 00 00 |.....4..........|
# 000000b2 00 01 00 00 00 00 00 00  00 04 00 00 00 00 00 00 |................|
# 000000c2 00                                               |..nx..........4.|
Description Value Decoded
UUID 89 6e 78 82 18 fe 11 ef ab 88 22 22 2d 34 d4 11 896e7882-18fe-11ef-ab88-22222d34d411
Tag 00 01 00 00 00 00 00 00 empty
Start 01 00 00 00 00 00 00 1
End 00 04 00 00 00 00 00 00 4

The second one:

# 000000c2    89 6e 78 82 18 fe 11  ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000d2 11 08 64 65 6d 6f 01 00  00 00 00 00 00 00 01 00 |..demo..........|
# 000000e2 00 00 00 00 00 00 04 00  00 00 00 00 00 00       |...............n|

GTID_TAGGED_LOG_EVENT

There is a new event called GTID_TAGGED_LOG_EVENT (type 2a). I expected this to be basically the same as the GTID_LOG_EVENT (type 21), but with tags.

This is what a GTID_LOG_EVENT looks like:

# at 158
#241202 20:02:25 server id 1  end_log_pos 235 CRC32 0xd3cf038f
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e 41 04 4e 67   21   01 00 00 00   4d 00 00 00   eb 00 00 00   00 00
# 000000b1 01 89 6e 78 82 18 fe 11  ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000c1 11 01 00 00 00 00 00 00  00 02 00 00 00 00 00 00 |................|
# 000000d1 00 00 01 00 00 00 00 00  00 00 fb 20 6c 30 4e 28 |............l0N.|
# 000000e1 06 ca f4 5f 01 00 8f 03  cf d3                   |..........|
#   GTID    last_committed=0    sequence_number=1   rbr_only=no original_committed_timestamp=1733166145216763   immediate_commit_timestamp=1733166145216763 transaction_length=202
# original_commit_timestamp=1733166145216763 (2024-12-02 20:02:25.216763 CET)
# immediate_commit_timestamp=1733166145216763 (2024-12-02 20:02:25.216763 CET)
/*!80001 SET @@session.original_commit_timestamp=1733166145216763*//*!*/;
/*!80014 SET @@session.original_server_version=90100*//*!*/;
/*!80014 SET @@session.immediate_server_version=90100*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:1'/*!*/;

This is what a GTID_TAGGED_LOG_EVENT looks like:

# at 158
#241202 19:53:13 server id 1  end_log_pos 240 CRC32 0x71f11620
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e 19 02 4e 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00
# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 72 08  00 0a 04 0c 7f e2 74 90 |.foobar.......t.|
# 000000e1 0f 4e 28 06 10 3d 03 12  a3 ff 0a 20 16 f1 71    |.N............q|
#   GTID    last_committed=0    sequence_number=1   rbr_only=no original_committed_timestamp=1733165593949410   immediate_commit_timestamp=1733165593949410 transaction_length=207
# original_commit_timestamp=1733165593949410 (2024-12-02 19:53:13.949410 CET)
# immediate_commit_timestamp=1733165593949410 (2024-12-02 19:53:13.949410 CET)
/*!80001 SET @@session.original_commit_timestamp=1733165593949410*//*!*/;
/*!80014 SET @@session.original_server_version=90100*//*!*/;
/*!80014 SET @@session.immediate_server_version=90100*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:foobar:1'/*!*/;

The output looks almost identical, but if you look at the hexdump you can see that there are quite some differences, for example you don’t see the 89 6e 78 82 that is the start of the UUID of the server. It looks like a new interal library (mysql::serialization) is used for serialization. Someone wrote took the time to write some documentation for this new library, which is great. However with this documentation I haven’t been able to decode this event yet.

Note that replication protocol is documented here.

Impact

I have been doing some work on the MySQL protocol dissector in Wireshark. As wireshark only decodes the binlog into events and doesn’t decode the events themselve it was enough to just add the event name and code for the GTID_TAGGED_LOG_EVENT.

For go-mysql, which is a Go library that amongst other things parses the binlog stream, the work is still ongoing. The code needs to handle the new format of the PREVIOUS_GTIDS_LOG_EVENT, the new GTID_TAGGED_LOG_EVENT event and also be able to parse the text format of the tagged GTIDs.

There are many other tools that might use the binlog stream and face the same issues.

Also, using tagged GTIDs is a sure way to break replication between 8.4 (or 8.3?) and earlier versions of MySQL.

These are the bugs that are related to tagged GTID that I have filed:

Monday, November 11, 2024

Release and version management is hard

In this post I will talk about release management with MySQL as example. Release management has to do with what to put in a release, how often to do a release, how to long to support a release and how to number or name the releases.

The early years

Today software is almost exclusively delivered over the internet, but this didn’t use to be the case. Software used to be delivered in a box that you had to buy at a physical store. These were often shrink wrapped and forced you to agree with the EULA by opening the box. The box then contained floppies, CD’s, DVD’s or any other suitable medium.

There were multiple reasons why physical media were used, even when the internet arrived. One of the main limitations was bandwidth, this is also why FreeBSD, Linux, etc were sold on physical media. And one of the other reasons was that online payment wasn’t as ubiquitous as it is today. Then there were also ways of copy protection that relied on physical media and/or dongles. For operating systems and such there was also a chicken-and-the-egg situation that you needed an OS to be able to download anything. And a nice box with a printed manual probably felt more valuable than a downloaded file.

This all put some constraints on release management. The time between releases was quite large as it would need new physical media to be produced, with a newly designed box, manual, etc. And as many software versions were sold per release it needed to have enough new features to convince people to pay for an upgrade. There were sometimes discounts for upgrades, but then the software needed some way to only work as upgrade or have a way to validate that the user was in possession of the previous version. Bugfixes and patches might be delivered online or not at all.

As people were either using dial-up and only connected to the internet for a short amount of time or not at all, this made security a lot less important than it is today. It also meant that people used manuals a lot more as web search might not have been available.

Release management today

Today with new releases being made available over the internet this has changed the constraints it puts on release management a lot.

We now have package managers that might download and upgrade with minimal or no user intervention. And we now have Puppet, Kubernetes, etc to manage software on a set of machines. As package managers take care of dependencies this makes it easier for software packages to have more and more complex dependencies. And with most systems being connected to the internet all the time, security and security updates become more important.

How this applies to MySQL

MySQL has had a interesting history when it comes to releases and versioning.

mysql-history-graph 
Source:  https://github.com/dveeden/mysql-history-graph

Early versions

MySQL 1.0 was never released to a wide audience. Then version 2 was skipped, probably because calling something version 3 was better for marketing. The first widely used version was probably MySQL 3.23.x. Note that MySQL isn’t following semantic versioning where the version has a major, a minor and a patch version. For MySQL the first two numbers (e.g. 5.6) make the major version. After 3.23 followed 4.0 and 4.1

The 5.x versions.

Version 5.0 of MySQL was filled with a large number of both features and bugs. I think this was driven by the partnership with SAP. This is a good example of what you get when you prioritize features over stability.

Version 5.1 brought partitioning.

And then MySQL 5,2, 5.3 and 5.4 never made it to production. Only 5.4 was released as development preview release.

With MySQL 5.5 the build system switched to a unified cmake based build system. Besides a big change for developers this also impacted Linux distros as they had to change how they build MySQL packages. This is the first version that had InnoDB as the default storage engine. Eerlier versions also supported InnoDB, but not as default, probably in part because at that time the InnoDB storage engine was in the hands of InnoBase and/or Oracle.

The 5.6 version brought GTID, which was a major step forward for replication.

Then 5.7 brought native JSON support and generated columns, among many other features.

What happend to 6.x and 7.x

MySQL 6.0 work started shortly after 5.1 was released, but never made it into a production release. It had many interesting features, most of which have been backported to later 5.x versions of MySQL. For what I understand this was due to the 6.0 codebase being unreliable due to a mix of badly tested and/or incomplete features.

Then the 7.x versions had been used for MySQL Cluster, which is basically a modified MySQL version that included the NDB storage engine. In 8.0 the changes for the server were merged with the regular MySQL code allowing a unified version.

The 8.x timeframe

MySQL 8.0 brought many new features but also had a big change in the foundation of MySQL. MySQL 8.0 used InnoDB to store the data dictionary instead of FRM files. This allows for better crash safety and atomic DDL and opens up future possibilities like transactional DDL.

As MySQL made its way to Oracle via Sun it now has to adhere to the release policies of Oracle. This means that each release has to be supported for a long time. So sticking with 8.0, and adding new features in updates of 8.0 was what they did. This worked well, but it had one big drawback: MySQL could deprecate features, API’s etc, but wasn’t able to remove them until a new major version.

Then MySQL 8.1, 8.2 and 8.3 were released as Innovation releases which ten lead to a 8.4 long term support release. In these releases MySQL cleaned up many of the previously deprecated features. I think cleaning up these things is good as it makes it easier to develop new features that don’t have to take these deprecated things into account and it makes for cleaner code. But it also means that third party software might have to test and update their code to avoid these now removed features. And a new major version also means a new training and certification program and all the other things related to a new major version.

The 9.x timeframe

MySQL has released 9.0 and 9.1 as innovation releases. However MySQL is more than just the server and for some components like MySQL Connector/J (The database driver for Java) the 9.1 release isn’t an innovation release but a regular GA release. And the MySQL Workbench GUI tool has been stuck at 8.0.x. So where 8.0 was a coordinated release for all components with more or less the same version, this is no longer true.

Learnings

Don’t sacrifice stability for features

I think MySQL 5.0 and MySQL 6.0 have been examples of this. Adding features first and then later making the codebase stable again doesn’t seem to be a good strategy. I think taking features from 6.0 and then adding them in 5.x releases was a smart decision.

Don’t stay on the same major version for too long

It seems that many people only upgraded to 8.0 recently and found it quite challenging to do. I think this is mostly because the large set of changes between two major versions. More often releases with slightly less big and/or breaking changes makes it easier to upgrade.

Cleanup is needed

I think it is critical for a software project to be able to deprecate and cleanup features and APIs. I do like how Go keeps compatibility between 1.x version but I don’t think this would work for MySQL.

Innovation and LTS releases

I think the innovation releases that are released often, but don’t get the long term support are helpful and make it easier for people to test and try features before they land in a long term support release.

Then a small number of LTS releases make it easier for developers to find a stable base for their application. As many bugfixes have to be backported this reduces the load on the MySQL developers.

Continues release

Some software projects now use a continues release model where there are no real versions, but just an ever moving set of updates. This is used for Debian unstable/sid. CentOS is using something that is somewhat similar with CentOS Stream, which still has versions, but not the minor versions that CentOS Linux used to have.

I don’t think this would make sense for MySQL, however a nightly build like what TiDB offers whould be nice, but this doesn’t seem to align with the procedures from Oracle.

Disclaimers

I don’t work for Oracle/MySQL so this is purely an outsiders view on things.

My main point is that managing release version numbers and contents is difficult. I used MySQL as an example as it has a rich history, but I could have used something else instead.

I work for PingCAP, but the content of this blog is only mine and not that of my employer.

Monday, May 6, 2024

MySQL Protocol: Collations

This story starts with a pull request for go-mysql to allow setting the collation in auth handshake that I was reviewing. The reason why the author wanted to do this is to speedup the connection setup as he has a latency sensitive application and a lot of connection setups and tear downs.

While looking at this I noticed that the collation would be stored in a single byte. However the list of supported collations shows collations with an ID that’s more than 255.

mysql> SELECT MIN(ID),MAX(ID) FROM information_schema.collations;
+---------+---------+
| MIN(ID) | MAX(ID) |
+---------+---------+
|       1 |     323 |
+---------+---------+
1 row in set (0.00 sec)

The protocol documentation for Protocol::HandshakeResponse41 says that the value sent here is only the lower 8-bits.

So I was wondering how do other connectors send this to the server? Are the other 8-bits sent elsewhere in the protocol?

So I used MySQL Connector/Python to try this out.

import mysql.connector
c = mysql.connector.connect(
    host='127.0.0.1',
    port=3306,
    user="test",
    password="test",
    collation="utf8mb4_ja_0900_as_cs",
    ssl_disabled=True
)
cur = c.cursor()
cur.execute("SHOW SESSION VARIABLES LIKE '%collation%'");
for col in cur:
    print(col)
cur.close()
c.close()

Here utf8mb4_ja_0900_as_cs is used which has collation ID 303 or 0x12F in hex. Another way to write this down is 0x01 0x2F

This is how this looked in Wireshark:

So the collation is set to latin1_bin (47) instead of utf8mb4_ja_0900_as_cs (303). Note that 47 in binary is 0x2F. And then we can spot the other part of this as 0x01 in the Unused field, for which the docs say it should be a filler of all zeros.

So I concluded that the protocol docs on this are probably outdated and I started a few merge/pull requests for this:

First this MR for Wireshark to have it decode the 2 bytes that we have seen as a single collation number. And then this PR for mysql-server to update the docs. I also created this PR for MySQL Connector/Python to update the comments in the code.

After doing this I also added a comment to the MariaDB docs for the protocol. And here Vladislav Vaintroub quickly responded with a conclusion which was different than my conclusion. He checked the code to see what the server (both MySQL and MariaDB) is reading and there it is only reading one byte.

After that I have created another MR for Wireshark to revert some of my changes. I also added some comments to my previous PRs/bugreports.

I also did some more testing with Connector/Python and found out that depending on whether or not the C Extension is used and also depending on the version that it might set the collation during the handshake but also after the handshake with SET NAMES. And not just once, it sometimes does it twice. And the second time it only specifies the character set and not the collation, causing the collation to reset to the default one for that character set. I’ve filed this bug for that.

The current state is that the collation can only be set during the handshake if it is <= 255. For other cases you need to send a SET NAMES statement. This isn’t great as this adds at least one roundtrip to the connection setup.

As all character sets have a default collation that’s <=255 this is only an issue with some of the non-default collations. It might also become an issue collations that might be added in the future.

mysql> SELECT MAX(id) FROM information_schema.collations WHERE IS_DEFAULT='YES';
+---------+
| MAX(id) |
+---------+
|     255 |
+---------+
1 row in set (0.01 sec)

Another indication on why this only affects few users is the MySQL C API.

This is how one can set the character set:

mysql_options(&mysql, MYSQL_SET_CHARSET_NAME, "latin1");
mysql_real_connect(&mysql, ...);

Note that MySQL doesn’t send this latin1 string to the server, it looks up the default collation and sends the ID of that to the server instead. So anything that uses the C API isn’t affected by this as it has to use SET NAMES to set non-standard collations.

So I assume that the field in the handshake packet either wasn’t considered once collations with ID’s of more than 255 were added or it was a way to send the character set in a more efficient way.

Note that character sets’s have only names and no numeric ID.

As MySQL Connector/Python takes a collation name, but sends a collation ID it needs to keep a list so it can lookup the ID before connecting to the server. So to connect with a new collation you might have to update the MySQL Connector/Python version.