Daniël's Database Blog

Automated MySQL Master Failover

After the GitHub MySQL Failover incident a lot of blogs/people have explained that fully automated failover might not be the most optimal solution. Fully automated failover is indeed dangerous, and should be avoided if possible. But a complete manual failover is also dangerous. A fully automated manually triggered failover is probably a better solution. A synchronous replication solution is also not a complete solution. A split-brain situation is a good example of a failure which could happen. Of course most clusters have all kinds of safe guard to prevent that, but unfortunately also safe guards can fail. Every failover/cluster should be considered broken unless: You've tested the failover scripts and procedures You've tested the failover scripts and procedures under normal load You've tested the failover scripts and procedures under high load You've tested it since the last change in the setup and/or application Someone else tested the failover scripts an...

Posts

Automated MySQL Master Failover