MySQL 8.4 and newer have extended the Global Transaction ID (GTID) functionality with a new “tag” option.
Refresher on GTID
A GTID is a unique ID that is assigned to a transaction. This is used
if gtid_mode
is set to ON
. The benefit of this
is that a transaction can be uniquely identified in a MySQL replication
setup with multiple levels. Among others this makes it easier to
refactor a replication tree as a MySQL replica knows which transactions
it has seen and can use this to find the right position to start
replicating from a new source.
The format of GTIDs is documented here.
Before GTID was used replication worked based on a file and offset
(e.g. file=binlog.000001
,offset=4
), which is
unique to every server.
A GTID without tag looks like this:
896e7882-18fe-11ef-ab88-22222d34d411:1
This is in the format of
<server_uuid>:<txid>
. The UUID of the server is
in the server_uuid
global variable and the
txid
is the transaction id, which is an increasing
number.
And for a range of transactions it looks like this:
896e7882-18fe-11ef-ab88-22222d34d411:1-12
Here there is a range of transactions, starting at 1 and ending at 12.
There can also be multiple ranges:
896e7882-18fe-11ef-ab88-22222d34d411:1-12,14-20
Here transaction 13 is missing in the range from 1 to 20.
And then there can be transactions from multiple servers:
896e7882-18fe-11ef-ab88-22222d34d411:1-12,14-20,fb0b950b-acb3-11ef-8f70-3674a2d2348e:1-10
GTID tags
If you run SET gtid_next='AUTOMATIC:abc
then the next
transaction will have a GTID that looks like this:
896e7882-18fe-11ef-ab88-22222d34d411:abc:1
There can be multiple tags in a GTID set and there can be multiple tags, for example:
896e7882-18fe-11ef-ab88-22222d34d411:1-3:abc:1:foo:1-3,5-7
This value is a compact way of representing this:
896e7882-18fe-11ef-ab88-22222d34d411:1
896e7882-18fe-11ef-ab88-22222d34d411:2
896e7882-18fe-11ef-ab88-22222d34d411:3
896e7882-18fe-11ef-ab88-22222d34d411:abc:1
896e7882-18fe-11ef-ab88-22222d34d411:foo:1
896e7882-18fe-11ef-ab88-22222d34d411:foo:2
896e7882-18fe-11ef-ab88-22222d34d411:foo:3
896e7882-18fe-11ef-ab88-22222d34d411:foo:5
896e7882-18fe-11ef-ab88-22222d34d411:foo:6
896e7882-18fe-11ef-ab88-22222d34d411:foo:7
So what are reason why you want to use GTID tags? One senario I can think of is heartbeats. So if you create a heartbeat table with one row and update that every second then you can use this to test replication delay and write availability.
CREATE TABLE heartbeat (
id INT PRIMARY KEY,
TIMESTAMP(6)
t
);
-- Monitoring service updates heartbeat, if writes fail: alert, use for write availability metric
SET gtid_next='AUTOMATIC:heartbeat';
REPLACE INTO heartbeat VALUES (1, CURRENT_TIMESTAMP(6));
-- On replicas, check the difference between the heartbeat and the current timestamp to measure replicaton delay
SELECT TIMESTAMPDIFF(MICROSECOND, t, CURRENT_TIMESTAMP(6)) FROM heartbeat;
There are tools like pt-heartbeat
from Percona Toolkit
that can help you with this.
Why having a heartbeat tag can help here is that losing the data of a heartbeat transaction is fine, while losing a transaction from an application isn’t.
Other senarios are data imports and data archival.
Binlog events
Binlogs (short for binary logs) are logs of changes made to the database. These can be used for replication between a primary database server and replicas and be used together with backups to do point-in-time recovery. Another usecase is sending events to other systems via a Change Data Capture (CDC) tool.
Adding tags impacts a few of the types of events that are sent in the binlog stream.
PREVIOUS_GTIDS_LOG_EVENT
This event is used to send which GTIDs were in previous binlog files after switching to a new file.
With
mysqlbinlog ... --read-from-remote-server --hexdump -vv $binlogfile
you can see what these events look like. Here is an example:
# at 127
#241126 17:24:31 server id 1 end_log_pos 158 CRC32 0x1b54f2b2
# Position Timestamp Type Source ID Size Source Pos Flags
# 0000007f 3f f6 45 67 23 01 00 00 00 1f 00 00 00 9e 00 00 00 80 00
# 00000092 00 00 00 00 00 00 00 00 b2 f2 54 1b |..........T.|
# Previous-GTIDs
# [empty]
The first line is a header with a timestamp, type, server_id, size,
position and flags. The second line ends with the CRC32 checksum. The
actual body for this event is 00 00 00 00 00 00 00 00
,
which means that there were no previous GTIDs. If there are GTIDs it
would change into 01 00 00 00 00 00 00 00
, etc. and then
for each GTID set it has the UUID and then one or more ranges of
transactions.
When MySQL sees the first tagged GTID it changes this event and keeps using a new format until the server is restarted.
Then the events look like this:
# at 127
#241126 17:57:35 server id 1 end_log_pos 158 CRC32 0x3b12f980
# Position Timestamp Type Source ID Size Source Pos Flags
# 0000007f ff fd 45 67 23 01 00 00 00 1f 00 00 00 9e 00 00 00 80 00
# 00000092 01 00 00 00 00 00 00 01 80 f9 12 3b |............|
# Previous-GTIDs
# [empty]
Here the body is 01 00 00 00 00 00 00 01
. here the last
01
means the GTID format where 00
is classic
and 01
means tagged. For tagged GTIDs the number of GTIDs
is encoded slightly differently. First a mask of
0xFF000000000000FF
is used and then the resulting value is
shifted by 1 byte. And then the GTIDs are encoded with a field for the
tag. If there are multiple tags, then these are send as separate GTID
sets in this event.
Here is an example:
# at 127
#241129 9:39:26 server id 1 end_log_pos 291 CRC32 0x248c7bd2
# Position Timestamp Type Source ID Size Source Pos Flags
# 0000007f ae 6f 49 67 23 01 00 00 00 a4 00 00 00 23 01 00 00 80 00
# 00000092 01 03 00 00 00 00 00 01 89 6e 78 82 18 fe 11 ef |.........nx.....|
# 000000a2 ab 88 22 22 2d 34 d4 11 00 01 00 00 00 00 00 00 |.....4..........|
# 000000b2 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 |................|
# 000000c2 00 89 6e 78 82 18 fe 11 ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000d2 11 08 64 65 6d 6f 01 00 00 00 00 00 00 00 01 00 |..demo..........|
# 000000e2 00 00 00 00 00 00 04 00 00 00 00 00 00 00 89 6e |...............n|
# 000000f2 78 82 18 fe 11 ef ab 88 22 22 2d 34 d4 11 0c 66 |x..........4...f|
# 00000102 6f 6f 62 61 72 01 00 00 00 00 00 00 00 01 00 00 |oobar...........|
# 00000112 00 00 00 00 00 03 00 00 00 00 00 00 00 d2 7b 8c |................|
# 00000122 24 |.|
# Previous-GTIDs
# 896e7882-18fe-11ef-ab88-22222d34d411:1-3:demo:1-3:foobar:1-2
So 01 03 00 00 00 00 00 01
means it is a tagged GTID
format (ends with 01
). And that there are 3 GTIDs.
The first one:
# 00000092 89 6e 78 82 18 fe 11 ef |.........nx.....|
# 000000a2 ab 88 22 22 2d 34 d4 11 00 01 00 00 00 00 00 00 |.....4..........|
# 000000b2 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 |................|
# 000000c2 00 |..nx..........4.|
Description | Value | Decoded |
---|---|---|
UUID | 89 6e 78 82 18 fe 11 ef ab 88 22 22 2d 34 d4 11 |
896e7882-18fe-11ef-ab88-22222d34d411 |
Tag | 00 01 00 00 00 00 00 00 |
empty |
Start | 01 00 00 00 00 00 00 |
1 |
End | 00 04 00 00 00 00 00 00 |
4 |
The second one:
# 000000c2 89 6e 78 82 18 fe 11 ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000d2 11 08 64 65 6d 6f 01 00 00 00 00 00 00 00 01 00 |..demo..........|
# 000000e2 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |...............n|
GTID_TAGGED_LOG_EVENT
There is a new event called GTID_TAGGED_LOG_EVENT
(type
2a
). I expected this to be basically the same as the
GTID_LOG_EVENT
(type 21
), but with tags.
This is what a GTID_LOG_EVENT
looks like:
# at 158
#241202 20:02:25 server id 1 end_log_pos 235 CRC32 0xd3cf038f
# Position Timestamp Type Source ID Size Source Pos Flags
# 0000009e 41 04 4e 67 21 01 00 00 00 4d 00 00 00 eb 00 00 00 00 00
# 000000b1 01 89 6e 78 82 18 fe 11 ef ab 88 22 22 2d 34 d4 |..nx..........4.|
# 000000c1 11 01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 |................|
# 000000d1 00 00 01 00 00 00 00 00 00 00 fb 20 6c 30 4e 28 |............l0N.|
# 000000e1 06 ca f4 5f 01 00 8f 03 cf d3 |..........|
# GTID last_committed=0 sequence_number=1 rbr_only=no original_committed_timestamp=1733166145216763 immediate_commit_timestamp=1733166145216763 transaction_length=202
# original_commit_timestamp=1733166145216763 (2024-12-02 20:02:25.216763 CET)
# immediate_commit_timestamp=1733166145216763 (2024-12-02 20:02:25.216763 CET)
/*!80001 SET @@session.original_commit_timestamp=1733166145216763*//*!*/;
/*!80014 SET @@session.original_server_version=90100*//*!*/;
/*!80014 SET @@session.immediate_server_version=90100*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:1'/*!*/;
This is what a GTID_TAGGED_LOG_EVENT
looks like:
# at 158
#241202 19:53:13 server id 1 end_log_pos 240 CRC32 0x71f11620
# Position Timestamp Type Source ID Size Source Pos Flags
# 0000009e 19 02 4e 67 2a 01 00 00 00 52 00 00 00 f0 00 00 00 00 00
# 000000b1 02 76 00 00 02 02 25 02 dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44 5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 72 08 00 0a 04 0c 7f e2 74 90 |.foobar.......t.|
# 000000e1 0f 4e 28 06 10 3d 03 12 a3 ff 0a 20 16 f1 71 |.N............q|
# GTID last_committed=0 sequence_number=1 rbr_only=no original_committed_timestamp=1733165593949410 immediate_commit_timestamp=1733165593949410 transaction_length=207
# original_commit_timestamp=1733165593949410 (2024-12-02 19:53:13.949410 CET)
# immediate_commit_timestamp=1733165593949410 (2024-12-02 19:53:13.949410 CET)
/*!80001 SET @@session.original_commit_timestamp=1733165593949410*//*!*/;
/*!80014 SET @@session.original_server_version=90100*//*!*/;
/*!80014 SET @@session.immediate_server_version=90100*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:foobar:1'/*!*/;
The output looks almost identical, but if you look at the hexdump you
can see that there are quite some differences, for example you don’t see
the 89 6e 78 82
that is the start of the UUID of the
server. It looks like a new interal library
(mysql::serialization
) is used for serialization. Someone
wrote took the time to write some documentation
for this new library, which is great. However with this documentation I
haven’t been able to decode this event yet.
Note that replication protocol is documented here.
Impact
I have been doing some work on the MySQL protocol dissector in Wireshark. As
wireshark only decodes the binlog into events and doesn’t decode the
events themselve it was enough to just add the event name and code for
the GTID_TAGGED_LOG_EVENT
.
For go-mysql,
which is a Go library that amongst other things parses the binlog
stream, the work is still ongoing. The code needs to handle the new
format of the PREVIOUS_GTIDS_LOG_EVENT
, the new
GTID_TAGGED_LOG_EVENT
event and also be able to parse the
text format of the tagged GTIDs.
There are many other tools that might use the binlog stream and face the same issues.
Also, using tagged GTIDs is a sure way to break replication between 8.4 (or 8.3?) and earlier versions of MySQL.
Related bugs
These are the bugs that are related to tagged GTID that I have filed: