The MySQL Protocol is the network protocol that is used between a MySQL server and a client. This is called the “classic” protocol as there is now a newer protobuf based protocol called X Protocol. However the “classic” protocol is used by many database drivers, applications, etc. and also by MySQL Replication.
The MySQL Protocol has the option to compress network traffic. Most client libraries don’t enable this by default as this increases CPU usage.
Some history
Initially this was added in MySQL 3.22 (based on historical release notes) and was based on zlib.
Then in MySQL 8.0.18 a second compression algorithm, Zstandard, was added based on a contribution by Facebook.
So zlib has been there basically forever and Zstandard support is new-ish (8.0.18 was released in October 2019).
Why you might want to use compression.
There are multiple usecases that are often mentioned when talking about the compressed protocol. The first one is slow connections. This could, for example, be a DBA doing remote work.
The second usecase is metered connections. This could be a DBA using a mobile connection or to reduce network transfer cost in a Cloud environment.
Basic usage
The usage is as simple as
mysql --compression-algorithms=zstd ...
. Instead of
zstd
you can also use zlib
or even provide
both (zstd,zlib
) and see what’s available on the server.
Most other clients like mysqldump
, MySQL Shell, etc also
support this setting. Older versions use
mysql --compress ...
which enables zlib
based
compression. What is used in the end depends on what the server
supports.
For zstd
there is a --zstd-compression-level
option that allows you to set a level between 1 and 22, where the
default is 3. This allows you to better balance between faster and
better compression.
Once connected you could check the Compression
,
Compression_algorithm
and Compression_level
status variables to verify what is being used.
sql> SHOW STATUS LIKE 'Compression%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Compression | ON |
| Compression_algorithm | zstd |
| Compression_level | 3 |
+-----------------------+-------+
3 rows in set (0.0043 sec)
In MySQL Shell the \status
command also provides you
with this information:
sql> \status
MySQL Shell version 8.2.0
Connection Id: 30
Current schema: test
Current user: root@localhost
SSL: Cipher in use: TLS_AES_128_GCM_SHA256 TLSv1.3
Using delimiter: ;
Server version: 8.2.0 MySQL Community Server - GPL
Protocol version: Compressed 10 <------------------
Client library: 8.2.0
Connection: 127.0.0.1 via TCP/IP
TCP port: 3306
Server characterset: utf8mb4
Schema characterset: utf8mb4
Client characterset: utf8mb4
Conn. characterset: utf8mb4
Result characterset: utf8mb4
Compression: Enabled (zstd) <------------------
Uptime: 1 day 3 hours 20 min 4.0000 sec
Threads: 2 Questions: 200 Slow queries: 0 Opens: 348 Flush tables: 3 Open tables: 266 Queries per second avg: 0.002
The connectors for your favorite programming language probably have
settings for this as well. Some might only allow you to enable
compression with zlib
and not yet offer you the option for
zstd
.
Language | Driver | zlib | zstd | Setting |
---|---|---|---|---|
C | MySQL C API (libmysqlclient) | ✅ | ✅ | MYSQL_OPT_COMPRESSION_ALGORITHMS and
MYSQL_OPT_COMPRESS |
Python | Connector/Python | ✅ | ❌ | compress=True |
Java | Connector/J | ✅ | ❌, Bug #112732 | useCompression=true |
Go | go-sql-driver/mysql | ❌ #24 | ❌ | - |
PHP | mysqli | ✅ | ❌ | MYSQLI_CLIENT_COMPRESS |
Perl | DBD::mysql | ✅ | ✅ | mysql_compression=zlib |
Note that many drivers are based on libmysqlclient
and
only require minimal changes to support this.
A look at the internals
Handshake
When establishing a connection both the client and server send a set
of capability
flags. There are two flags that are used for compression: First
there is CLIENT_COMPRESS
that is used to advertise support
for zlib and then there is
CLIENT_ZSTD_COMPRESSION_ALGORITHM
that is used to advertise
support for Zstandard. If both the client and the server have a flag set
then that feature is used.
A bit of a special case is possible when both the client and server
set both of the compression flags, in that case zlib is used. You can
try this out by running --compression-algorithms=zstd,zlib
and looking at the status variables.
If the client has the CLIENT_ZSTD_COMPRESSION_ALGORITHM
flag set then the Protocol::HandshakeResponse41
also contains a compression level for Zstandard.
Packet compression
The MySQL Protocol consists of “MySQL Packets” that are send between the client and server. When compression is used this is added as a separate layer in the network stack.
So the stack basically looks something like this:
| MySQL Protocol |
| MySQL Compressed Protocol |
| TLS (Optional) |
| TCP/IP | Unix Socket | ..... |
And yes, if you want you can use the compressed protocol over a Unix Socket. But I can’t think of a good reason to do so.
The .....
above is because of other
connection options on Windows, like Shared memory and Named
Pipes.
As this is a separate layer there isn’t a 1-to-1 mapping between compressed packets and regular packets.
So what do these MySQL Compressed Protocol packets look like?
Let’s start by looking at the header.
Description:
| <compressed_length[3]> | <compressed_sequence[1]> | <uncompressed_length[3] |
Example values (hex):
| 15 00 00 | 00 | 000000 |
So here we have: - Compressed Length: 21. This is
15 00 00
in hex. This is a 24 bit (3 byte) integer. -
Compressed Sequence: 0 - Uncompressed Length: 0
If you know the regular MySQL Protocol this probably looks very similar, including the use of the somewhat excentric 24 bit integer.
That the Uncompressed Length is set to zero means that for this
packet the payload isn’t actually compressed. This happens when the side
that does compression doesn’t think it makes sense to compress the
payload. MySQL doesn’t compress the payload if it is less than
MIN_COMPRESS_LENGTH
, which is 50 bytes.
After the header we get the payload which depending on the capability flags might be compressed with zlib or Zstandard or might be uncompressed if the Uncompressed Length is set to 0.
This is how this looks in Wireshark:
A compressed protocol packet can also contain multiple (regular) MySQL packets:
Here there are eight regular MySQL Packets inside the compressed packet.
Wireshark can uncompress the compressed payload (both zlib and zstd). This is why it is able to show you what’s in the compressed packet. On the bottom you can see “Fram (202 bytes)” and “compressed data (147 bytes)”.
For this packet we have: - Compressed Length: 129 bytes - Uncompressed Length: 147 bytes - TCP Payload Length: 136 bytes (this matches our expectation: 129 bytes payload + 7 byte header)
So instead of sending 147 bytes we sent 136 bytes. We saved 11 bytes (~7.5%).
The actual compression ratio depends on the size (bigger is better), randomness of the content, the compression algorithm and compression level. For zlib the compression level is hardcoded, so you can only use different compression levels for Zstandard.
We talked about two situations: The first situation is one compressed packet with one MySQL packet and the second situation is one compressed packet with multiple MySQL packets. But there is a third situation: Multiple compressed packets for one MySQL packet.
A single MySQL Protocol Packet can be split over multi compressed packets.
Efficiency
To get good results with compression in general it really helps if you’re compressing larger sets of data and if the compressed data isn’t very random.
Let’s see what results we can get in a best case senario by using
REPEAT()
to send a string of 100 million times
x
.
To do this we need to make sure we set a large
max_allowed_packet
for both the server and client. We also
disable SSL/TLS as that makes capturing things easier. In case you
really need this, Wireshark is able to decrypt
MySQL sessions with TLS.
$ mysql --compression-algorithms=zlib -h 127.0.0.1 -u root -pRoot123@ --ssl-mode=DISABLED --max-allowed-packet=1073741824
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 237
Server version: 8.2.0 MySQL Community Server - GPL
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> pager wc
PAGER set to 'wc'
mysql> select repeat('x',100*1024*1024);
5 9 419431449
1 row in set (1.06 sec)
mysql> ^DBye
The traffic that we captured in Wireshark looks like this:
I’ve added the compressed and uncompressed length as columns as well
as the source and destination port numbers and hidden some of the other
columns to make this fit on the screenshot. I’ve also set the display
filter to mysql
to hide some other packets.
The packets 1 untill 14 are from the connection setup. Packet 16 has
our SELECT
-query. And packets 18 untill 46 have our
resultset. The final packet (48) is the COM_QUIT
triggered
by our Ctrl-D to close the session.
Let’s focus on the resultset (packets 18-46).
These packets have compressed sequence numbers 1 until 14. The query sent by the client had compressed sequence number 0.
Packet Number | Compressed Sequence Number | Compressed Length | Uncompressed Length |
---|---|---|---|
18 | 1 | 106 | 16384 |
20 | 2 | 16302 | 16760891 |
22 | 3 | 41 | 16384 |
24 | 4 | 16302 | 16760835 |
26 | 5 | 41 | 16384 |
28 | 6 | 16302 | 16760835 |
30 | 7 | 41 | 16384 |
32 | 8 | 16302 | 16760835 |
34 | 9 | 41 | 16384 |
36 | 10 | 16302 | 16760835 |
38 | 11 | 41 | 16384 |
40 | 12 | 16302 | 16760835 |
42 | 13 | 44 | 16384 |
44 | 14 | 4071 | 4177939 |
46 | 15 | 11 | 0 |
As you can see there are multiple packets of 16384, which is because
our net_buffer_length
is set to the default (16384
).
For the MySQL Packets inside of the compressed packets the length is
set to FF FF FF
to indicate that this isn’t the last
packet. See the section about Sending
more than 16Mb in the docs for more details on how this works.
So there are multiple MySQL packets of 16777216 (including 4 byte header) sent to send the total payload. This is the maximum size of a MySQL packet as the length is a 24 bit integer and 224 = 16777216.
But as a Compressed Packet also uses a 24 bit integer to store the
compressed and uncompressed size it can only store a MySQL Packet of
16777216 - 7 (header) = 16777209 bytes. So a 16K MySQL packet needs
multiple compressed packets. As MySQL started with a 16K
(net_buffer_length
) packet the remainder of the packet fits
in the next compressed packet.
Combining the uncompressed lengths of all packets this gives us 104857693 and if we add the 11 bytes of the last packet that didn’t have it’s payload compressed we get 104857704 which is about 100 MiB.
If we combine the compressed payloads we get 102249 which is slightly less than 100 KiB. This probably doesn’t include the header for the 15 packets, but with 15 * 7 = 105 bytes that isn’t going to change the result.
So in this case compression really reduces the amount of bytes we need to transfer.
However this is a best case senario. In the real world you will probably have a mix of large resultsets and smaller resultsets and also have data that doesn’t have this many repeating values.
Performance
Percona did some testing in 2019 that shows that the compressed protocol helps with performance if your workload is limited by your network bandwidth. It also shows that it reduces performance in case you’re not limited by network bandwith.
Replication
As MySQL replication uses the “classic” protocol as well there is the
option to enable compression. For this the settings are
SOURCE_COMPRESSION_ALGORITHMS
and
SOURCE_ZSTD_COMPRESSION_LEVEL
that are part of the CHANGE REPLICATION SOURCE
statement. And there is also the replica_compressed_protocol
setting that if enabled takes precedence over the
SOURCE_COMPRESSION_ALGORITHMS
setting.
However with replication you might want to use Binary Log Transaction Compression instead as that also reduces the size on disk and avoids compressing and uncompressing the same data multiple times.
Observability
As said before there are three session status variables: -
Compression
- Compression_algorithm
-
Compression_level
In performance_schema
I only found a single instrument
related to the compressed protocol:
sql> SELECT * FROM setup_instruments WHERE name like '%compress_packet%'\G
*************************** 1. row ***************************
NAME: memory/sql/NET::compress_packet
ENABLED: YES
TIMED: NULL
PROPERTIES:
FLAGS:
VOLATILITY: 0
DOCUMENTATION: Buffer used when compressing a packet.
1 row in set (0.0029 sec)
Note that the session status can be used to get server wide numbers like this:
sql> SELECT VARIABLE_VALUE,COUNT(*)
-> FROM performance_schema.status_by_thread
-> WHERE VARIABLE_NAME='Compression_algorithm'
-> GROUP BY VARIABLE_VALUE;
+----------------+----------+
| VARIABLE_VALUE | COUNT(*) |
+----------------+----------+
| zlib | 1 |
| zstd | 2 |
| | 1 |
+----------------+----------+
3 rows in set (0.0006 sec)
One of the obvious things that seems to lack instrumentation is the effectiveness of compression like the compression ratio or the amount of bytes saved by enabling compression. In addition to this it would also be good to get more info on how much CPU time is spend doing compression and uncompression.
There is this:
sql> show session status like 'Bytes%';
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| Bytes_received | 2381 |
| Bytes_sent | 13455 |
+----------------+-------+
2 rows in set (0.0015 sec)
However the docs aren’t clear on if this is measuring the MySQL Protocol (before compresssion) or the network link (after compression).
For the X Protocol there is more information available, but that doesn’t help us with the “classic” protocol.
sql> SHOW SESSION STATUS LIKE 'Mysqlx_bytes_%';
+------------------------------------------+-------+
| Variable_name | Value |
+------------------------------------------+-------+
| Mysqlx_bytes_received | 0 |
| Mysqlx_bytes_received_compressed_payload | 0 |
| Mysqlx_bytes_received_uncompressed_frame | 0 |
| Mysqlx_bytes_sent | 0 |
| Mysqlx_bytes_sent_compressed_payload | 0 |
| Mysqlx_bytes_sent_uncompressed_frame | 0 |
+------------------------------------------+-------+
6 rows in set (0.0014 sec)
Having information on compressed vs uncompressed like this for the classic protocol would be nice. However this doesn’t seem to have per algorighm info.
And for the Binary Log Transaction Compression features there is this:
sql> DESCRIBE performance_schema.binary_log_transaction_compression_stats;
+--------------------------------------+------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------------+------------------------+------+-----+---------+-------+
| LOG_TYPE | enum('BINARY','RELAY') | NO | | NULL | |
| COMPRESSION_TYPE | varchar(64) | NO | | NULL | |
| TRANSACTION_COUNTER | bigint unsigned | NO | | NULL | |
| COMPRESSED_BYTES_COUNTER | bigint unsigned | NO | | NULL | |
| UNCOMPRESSED_BYTES_COUNTER | bigint unsigned | NO | | NULL | |
| COMPRESSION_PERCENTAGE | smallint | NO | | NULL | |
| FIRST_TRANSACTION_ID | text | YES | | NULL | |
| FIRST_TRANSACTION_COMPRESSED_BYTES | bigint unsigned | NO | | NULL | |
| FIRST_TRANSACTION_UNCOMPRESSED_BYTES | bigint unsigned | NO | | NULL | |
| FIRST_TRANSACTION_TIMESTAMP | timestamp(6) | YES | | NULL | |
| LAST_TRANSACTION_ID | text | YES | | NULL | |
| LAST_TRANSACTION_COMPRESSED_BYTES | bigint unsigned | NO | | NULL | |
| LAST_TRANSACTION_UNCOMPRESSED_BYTES | bigint unsigned | NO | | NULL | |
| LAST_TRANSACTION_TIMESTAMP | timestamp(6) | YES | | NULL | |
+--------------------------------------+------------------------+------+-----+---------+-------+
14 rows in set (0.0017 sec)
This has per algorithm stats as COMPRESSION_TYPE
has
ZSTD
for example. There are individual rows in this table
per LOG_TYPE
and COMPRESSION_TYPE
combination.
This table even has a very user friendly
COMPRESSION_PERCENTAGE
field that can be used to see how
well the data is compressing.
With a SytemTap script like the one below you can get some of the same issue as what Wireshark can show you. Maybe this can be extended to get more useful details. For now the only benefit of this is that it works with TLS connections without any addtional configuration.
mysql_compress.stp
:
probe process("/usr/bin/mysql").function("my_uncompress*").call {
cl = @cast($complen, "size_t")
printf("\nReceiving compressed packet with len=%u, complen=%u\n", $len, cl)
}
Example usage:
$ sudo stap mysql_compress.stp -c 'mysql --compression-algorithms=zstd -h 127.0.0.1 -u root -p'
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 301
Server version: 8.2.0 MySQL Community Server - GPL
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
Receiving compressed packet with len=92, complen=0
mysql>
Receiving compressed packet with len=143, complen=161
mysql> pager wc
PAGER set to 'wc'
mysql> select repeat('x',10);
Receiving compressed packet with len=71, complen=0
5 9 95
1 row in set (0.00 sec)
mysql> select repeat('x',100);
Receiving compressed packet with len=80, complen=162
5 9 525
1 row in set (0.00 sec)
mysql> select repeat('x',1000);
Receiving compressed packet with len=81, complen=1065
5 9 5025
1 row in set (0.00 sec)
mysql> select repeat('x',10000);
Receiving compressed packet with len=86, complen=10066
5 9 41049
1 row in set (0.00 sec)
mysql> ^DBye
[dvaneeden@dve-carbon ~]$ cat mysql_compress.stp
probe process("/usr/bin/mysql").function("my_uncompress*").call {
cl = @cast($complen, "size_t")
printf("\nReceiving compressed packet with len=%u, complen=%u\n", $len, cl)
}
If you create trace files you get some info:
...
T@18: | | | | | | | | note: Packet got longer on compression; Not compressed
...
T@18: | | | | | | | | note: Packet too short: Not compressed
...
T@18: | | | | | | | | note: Packet got longer on zstd compression; Not compressed
This will tell you when the payload was less than 50 bytes or when either zlib or zstd compressed data was longer than the original data.
Possible improvements
The MIN_COMPRESS_LENGTH
is now hardcoded. It might make
sense to make this configurable and set it to say 1 KiB or 1 MiB and
only compress the larger payloads. This might give a better balance
between performance and compression.
Docs
WL#12039: Add support for zstd compression to classic protocol in the server
WL#12475: Protocol Changes to specify compression configuration for connections
Work done by me
Here are some of the compressed protocol related things that I have worked on. As is usualy the case with open source projects this involved a lot of help from maintainers and other contributors on these projects. Many thanks to everyone that was involved.
Wireshark
- Decoding of compressed packets in
packet_myql.c
No comments:
Post a Comment