This is a follow-up for MySQL GTID tags and binlog events, but you don’t need to read that first.
One of the recent innovations in MySQL was the addition of Tagged
GTID’s. These tagged GTID’s take the format of
<uuid>:<tag>:<transaction_id>
.
And this change means that the GTID_LOG_EVENT
’s in the
binary logs needed to be changed. The MySQL team at Oracle decided to
not change the existing format, but introduce a new event:
GTID_TAGGED_LOG_EVENT
.
Initially I assumed that decoding the new event would me mostly identical to the original event, but with just a single field added. But this isn’t the case as Oracle MySQL deciced to use a new serialization format (Yes, more innovation) and use it for this new event. The new serialization format is documented here.
Oracle MySQL did their best to document the library and the format, but I guess the target audience is people using this library to encode and decode the format and not people working on implementations in different languages.
In this blog I’ll try to demonstrate my knowledge of the format that I gained by experimenting with it.
The first useful tool is mysqlbinlog --hexdump -vvv
,
which can read files directly or read from a remote server. This will
decode the events and also print a hexdump of the event.
# at 158
#250217 21:14:49 server id 1 end_log_pos 240 CRC32 0xce6c39ef
# Position Timestamp Type Source ID Size Source Pos Flags
# 0000009e b9 98 b3 67 2a 01 00 00 00 52 00 00 00 f0 00 00 00 00 00
# 000000b1 02 76 00 00 02 02 25 02 dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44 5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08 00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12 c3 02 0b ef 39 6c ce |.....I......9l.|
# GTID last_committed=0 sequence_number=1 rbr_only=no original_committed_timestamp=1739823289369365 immediate_commit_timestamp=1739823289369365 transaction_length=210
# original_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
# immediate_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
/*!80001 SET @@session.original_commit_timestamp=1739823289369365*//*!*/;
/*!80014 SET @@session.original_server_version=90200*//*!*/;
/*!80014 SET @@session.immediate_server_version=90200*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:foobaz:1'/*!*/;
This event has the same header as all binlog events and on the end it has a CRC32 checksum.
The header:
# Position Timestamp Type Source ID Size Source Pos Flags
# 0000009e b9 98 b3 67 2a 01 00 00 00 52 00 00 00 f0 00 00 00 00 00
The checksum:
CRC32 0xce6c39ef
...
# 000000e1 ef 39 6c ce |.....I......9l.|
If we strip the header and checkum away we end up with the event body:
# 000000b1 02 76 00 00 02 02 25 02 dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44 5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08 00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12 c3 02 0b |.....I......9l.|
The mysql::serialization
format is only used for the
body.
The message in the body is encoded like this:
Value | Decoded | Reference | Meaning / Notes |
---|---|---|---|
02 | 1 | serialization_version_number | |
76 | 59 | serializable_field_size | This is the length of the body (the event body has 3 lines of 16 bytes and one line of 11 bytes) |
00 | 0 | last_non_ignorable_field_id | id of the last required field |
00 | 0 | field id | |
02 | 1 | gtid_tags | |
02 | 1 | field id | |
25 02 dc f0 09 02 30 f9 03 22 bd 03 ad 02 21 02 44 44 5a 68 51 03 22 | 896e7882-18fe-11ef-ab88-22222d34d411 | uuid | 16 bytes of a UUID |
04 | 2 | field id | |
04 | 2 | rpl_gtid_gno | This is the transaction id |
06 | 3 | field id | |
0c 66 6f 6f 62 61 7a | foobaz | tag | first byte stores the length of the string |
08 | 4 | field id | |
00 | 0 | last_committed | |
0a | 5 | field id | |
04 | 2 | sequence_number | |
0c | 6 | field id | |
7f 15 83 22 2d 2d 5c 2e 06 | 1739823289369365 | immediate_commit_timestamp | 2025-02-17 21:14:49.369365 CET |
10 | 8 | field id | |
49 03 | 210 | transaction_length | |
12 | 9 | field id | |
c3 02 0b | 90200 | immediate_server_version |
The first thing to notice here is that fields are numbered. And as you can see field 7 is skipped.
For both encoding and decoding you need to know the field definition, which for this event can be found here. The data types and names of the fields are not encoded in the message.
Most numbers can be decoded with a shift operation:
0x04>>1 = 2
.
Note that you have to take care of byte order, so to decode
c3 02 0b
, you need to reverse them first. And then
0x0b02c3>>3 = 90200
.
But how do you know by how many bytes you have to shift? You need to
look at the first byte, so for c3 02 0b
, that’s
c3
. And then you look at the bit representation:
11000011
. If you count the number of consecutive
1
’s from right to left this gives you 2. And then you know
that there are two more bits following (02 0b
) and that you
have to shift by 2+1=3
.
And for the decoding of the UUID:
Encoded | Decoded |
---|---|
25 02 | 89 |
dc | 6e |
f0 | 78 |
09 02 | 82 |
30 | 18 |
f9 03 | fe |
22 | 11 |
bd 03 | ef |
ad 02 | ab |
21 02 | 88 |
44 | 22 |
44 | 22 |
5a | 2d |
68 | 34 |
51 03 | d4 |
22 | 11 |
This gives you 896e788218fe11efab8822222d34d411
, which
then formatted as a UUID looks like this:
896e7882-18fe-11ef-ab88-22222d34d411
.
The second byte is 02
if the value is in the range of
0x80 - 0xc
and 03
if it is in the range of
0xc - 0xff
.
So for the first value: 25 02
:
(0x25>>2)+0x80 = 89
.
And the second value: dc
:
0xdc>>1 = 6e
.
I hope this helps others that want to decode or encode these formats and can’t use the library that MySQL provides.
No comments:
Post a Comment