Tuesday, February 18, 2025

Decoding MySQL's GTID_TAGGED_LOG_EVENT

This is a follow-up for MySQL GTID tags and binlog events, but you don’t need to read that first.

One of the recent innovations in MySQL was the addition of Tagged GTID’s. These tagged GTID’s take the format of <uuid>:<tag>:<transaction_id>.

And this change means that the GTID_LOG_EVENT’s in the binary logs needed to be changed. The MySQL team at Oracle decided to not change the existing format, but introduce a new event: GTID_TAGGED_LOG_EVENT.

Initially I assumed that decoding the new event would me mostly identical to the original event, but with just a single field added. But this isn’t the case as Oracle MySQL deciced to use a new serialization format (Yes, more innovation) and use it for this new event. The new serialization format is documented here.

Oracle MySQL did their best to document the library and the format, but I guess the target audience is people using this library to encode and decode the format and not people working on implementations in different languages.

In this blog I’ll try to demonstrate my knowledge of the format that I gained by experimenting with it.

The first useful tool is mysqlbinlog --hexdump -vvv, which can read files directly or read from a remote server. This will decode the events and also print a hexdump of the event.

# at 158
#250217 21:14:49 server id 1  end_log_pos 240 CRC32 0xce6c39ef
# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e b9 98 b3 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00
# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08  00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12  c3 02 0b ef 39 6c ce    |.....I......9l.|
#   GTID    last_committed=0    sequence_number=1   rbr_only=no original_committed_timestamp=1739823289369365   immediate_commit_timestamp=1739823289369365 transaction_length=210
# original_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
# immediate_commit_timestamp=1739823289369365 (2025-02-17 21:14:49.369365 CET)
/*!80001 SET @@session.original_commit_timestamp=1739823289369365*//*!*/;
/*!80014 SET @@session.original_server_version=90200*//*!*/;
/*!80014 SET @@session.immediate_server_version=90200*//*!*/;
SET @@SESSION.GTID_NEXT= '896e7882-18fe-11ef-ab88-22222d34d411:foobaz:1'/*!*/;

This event has the same header as all binlog events and on the end it has a CRC32 checksum.

The header:

# Position  Timestamp   Type   Source ID        Size      Source Pos    Flags
# 0000009e b9 98 b3 67   2a   01 00 00 00   52 00 00 00   f0 00 00 00   00 00

The checksum:

                                              CRC32 0xce6c39ef
...
# 000000e1                                   ef 39 6c ce    |.....I......9l.|

If we strip the header and checkum away we end up with the event body:

# 000000b1 02 76 00 00 02 02 25 02  dc f0 09 02 30 f9 03 22 |.v..........0...|
# 000000c1 bd 03 ad 02 21 02 44 44  5a 68 51 03 22 04 04 06 |......DDZhQ.....|
# 000000d1 0c 66 6f 6f 62 61 7a 08  00 0a 04 0c 7f 15 83 22 |.foobaz.........|
# 000000e1 2d 5c 2e 06 10 49 03 12  c3 02 0b                |.....I......9l.|

The mysql::serialization format is only used for the body.

The message in the body is encoded like this:

Value Decoded Reference Meaning / Notes
02 1 serialization_version_number
76 59 serializable_field_size This is the length of the body (the event body has 3 lines of 16 bytes and one line of 11 bytes)
00 0 last_non_ignorable_field_id id of the last required field
00 0 field id
02 1 gtid_tags
02 1 field id
25 02 dc f0 09 02 30 f9 03 22 bd 03 ad 02 21 02 44 44 5a 68 51 03 22 896e7882-18fe-11ef-ab88-22222d34d411 uuid 16 bytes of a UUID
04 2 field id
04 2 rpl_gtid_gno This is the transaction id
06 3 field id
0c 66 6f 6f 62 61 7a foobaz tag first byte stores the length of the string
08 4 field id
00 0 last_committed
0a 5 field id
04 2 sequence_number
0c 6 field id
7f 15 83 22 2d 2d 5c 2e 06 1739823289369365 immediate_commit_timestamp 2025-02-17 21:14:49.369365 CET
10 8 field id
49 03 210 transaction_length
12 9 field id
c3 02 0b 90200 immediate_server_version

The first thing to notice here is that fields are numbered. And as you can see field 7 is skipped.

For both encoding and decoding you need to know the field definition, which for this event can be found here. The data types and names of the fields are not encoded in the message.

Most numbers can be decoded with a shift operation: 0x04>>1 = 2.

Note that you have to take care of byte order, so to decode c3 02 0b, you need to reverse them first. And then 0x0b02c3>>3 = 90200.

But how do you know by how many bytes you have to shift? You need to look at the first byte, so for c3 02 0b, that’s c3. And then you look at the bit representation: 11000011. If you count the number of consecutive 1’s from right to left this gives you 2. And then you know that there are two more bits following (02 0b) and that you have to shift by 2+1=3.

And for the decoding of the UUID:

Encoded Decoded
25 02 89
dc 6e
f0 78
09 02 82
30 18
f9 03 fe
22 11
bd 03 ef
ad 02 ab
21 02 88
44 22
44 22
5a 2d
68 34
51 03 d4
22 11

This gives you 896e788218fe11efab8822222d34d411, which then formatted as a UUID looks like this: 896e7882-18fe-11ef-ab88-22222d34d411.

The second byte is 02 if the value is in the range of 0x80 - 0xc and 03 if it is in the range of 0xc - 0xff.

So for the first value: 25 02: (0x25>>2)+0x80 = 89.

And the second value: dc: 0xdc>>1 = 6e.

I hope this helps others that want to decode or encode these formats and can’t use the library that MySQL provides.

No comments:

Post a Comment