Sunday, December 9, 2012

A difficult XtraBackup restore

There was one MySQL server with a Adaptec Raid controller and 4 disks. One of the disks was having media errors and caused the whole SCSI bus to become unavailable.

This resulted in a corrupted InnoDB table.

Luckily we did have backups. A full backup and incrementals.

So to restore the backups I installed XtraBackup and MySQL 5.5 on another server.

Then the first step was to 'prepare' the backup. This worked okay for the full backup (redo only).

The second step to add the incremantals failed for the first incremental. This was easily resolved by specifying the full paths instead of relative paths.

Then the backup was fully prepared using the redo logs and undo logs.

As XtraBackup doesn't backup your my.cnf we copied the my.cnf from another server and adjusted it for this server. The my.cnf in your backup only contains everything needed for a restore, and some of those settings are Percona Server specific and will result in an error when used with MySQL.

So far everything went as expected.

Then we started MySQL with the newly restored backup by executing "service mysql start", this failed.

Then I tried to start it with a "mysqld_safe &", this worked. So I expected the init script to use wrong parameters. So I executed "bash -x /etc/init.d/mysql start" and expected the error to show, but MySQL started okay this time. Using "/etc/init.d/mysql stop" and then "/etc/init.d/mysql start" resulted in an error again.
Then I added some echo statements to the script to check which parameters it used for mysqld_safe. Then I tried to start mysqld_safe with the same parameters, this worked.

So mysqld_safe failed to start and didn't log anything to the error log when executed from the init script. It does work when called directly.

Then it hit me. The SELinux context for the MySQL files was wrong (ls -lZ).

This is something that must not be changed with chcon as that won't survive a relabel action. I changed the context for the datadir (not /var/lib/mysql in this case) and location for the socket and pid files with semanage. The I used restorecon to apply it to the files. And then MySQL started fine. This is explained in in detail in this blog post.

Conclusions:
  • XtraBackup works great (and fast, disk I/O is quickly a bottleneck for restores)
  • Use full paths instead of relative paths for XtraBackup
  • Make sure to backup your my.cnf (or put in a CM tool like puppet)
  • If you use SELinux, don't forget to make sure the file context is correct after a restore.

2 comments:

  1. Was this a hourly billed consulting gig? How many hours for figuring out the SELinux thing? I often spend 2-3 hours backpedaling purely due to SELinux. (But Nokia firewalls take the price, they can waste a full day in the worst case.)

    ReplyDelete
    Replies
    1. I probably spend less than an hour on this the SELinux issue. The main problem is that there is no clear error message. Tools like setroubleshoot aim to make troubleshooting easier.

      The SELinux system can prevent or reduce the impact of many security vulnerabilities, so it's better to leave it on. With {RHEL,CentOS,OEL}6 it shouldn't be too hard to leave it on.

      Another gotcha is can be IPv6, especially in IPv6-only setups. Applications will try IPv6, then IPv4 and then only report the error for the IPv4 connection.

      And In my experience a Cisco PIX can be much worse than a Nokia firewall. The PIX can convert RFC compiant (E)SMTP traffic to non-RFC compliant in such a way that the target server won't accept it. Luckily most shops converted to ASA's now.

      Delete