Sunday, November 24, 2013

The importance of multi source replication

One of the latest labs releases of Oracle MySQL brings multi source replication. This lifts the limitation found in earlier releases that a MySQL slave can only have one master.

To be fair, there were other ways of doing this already:
There are many good uses of multi source replication. You could use it to combine data from multiple shards or applications.

If MySQL is used with a loadbalancer the most easy to build setup is a 2-way multi master. This makes it possible to use the InnoDB storage engine. Using MySQL Cluster is another alternative, but MySQL Cluster uses the NDB storage engine, and might not be a supported option for your application. A MySQL Cluster setup also needs at least 4 machines to be fully redundant and MySQL Multi Master only needs two machines.

There is little intelligence required in the loadbalancer. It should write to one server and read from both servers. If the first server is unavailable then it should write to the second one. The requirement to only write to one server has to do with the fact that replication is not fully synchronous (MySQL Cluster is synchronous, and there it is supported to write to all nodes). While the might seem like a disadvantage, it can actually be helpfull to do online schema changes.

One of the drawbacks of multi master is that multi master with more than 2 nodes can be a nightmare to maintain and will probably not help you to get more performance or reliability.

Another drawback is that it is not easy to add a disaster recovery setup to a multi master setup. If you have 2 locations with 2 servers in each location and create a multi-master setup from each pair than it's not possible to get one pair to slave from another pair as each server already has a master. You could create a multi-master on one site and then have a slave-with-a-slave in the other location, but then you'll have to change that setup during or after the site failover to get to the same setup as you had in the primary location.

With multi source replication you can now create a multimaster setup which is slave of another multi-master setup.

I did do some basic tests with the labs release for multi source replication and it looks great.

The "FOR CHANNEL='chan1'" syntax works quite well, although I would have gone for "FOR CHANNEL 'chan1'" (without the equal sign). I would have been nice if MariaDB and MySQL would have used the same syntax, but unfortunately this isn't the case. (MariaDB uses "CHANGE MASTER 'master1' TO...")

For multi source replication to work you have to set both master_info_repository and relay_log_info_repository to TABLE. I only had one of these set, and the error message was not really clear about which setting was wrong.
2013-11-23T14:37:52.972108Z 1 [ERROR] Slave: Cannot create new master info structure when  repositories are of type FILE. Convert slave  repositories  to TABLE to replicate from Multiple sources.

I build a tree-node multi-master with multi source replication. Each server was a slave of the other two servers. The advantages of this setup to a regular 3 node circular setup is that you don't need to enable log-slave-updates, which can save quite some I/O on the binlog files. Also if one node breaks then the remaining two nodes will still receive updates from each other instead of only one node receiving all updates.

If you use sharding you can use multi source replication to combine two (or  more) shards into one. This is similar to how I like to do major version upgrades: first make the new setup a slave of the old setup. This gives you a window for testing the new setup.

So multi source replication is very usefull in many setups.

Saturday, November 9, 2013

MariaDB's RETURNING feature.

There is a new feature in the MariaDB 10 Beta which caught my eye: support for returning a result set on delete.

With a 'regular' DELETE operation you only get to know the number of affected rows. To get more info or actions you have to use a trigger or a foreign key. Anoter posibility is doing a SELECT and then a DELETE and with the correct transaction isolation a transactional support this will work.

With the support for the RETURNING keyword this has become easier to do and it will probably bennefit performance and save you a few roundtrips and a few lines of code.

There is already support for RETURNING in PostgreSQL. And PostgreSQL has an other nifty feature for which RETURNING really helps: CTE or common table expressions or the WITH keyword. I really hope to see CTE support in MySQL or MariaDB some day.

An example from RETURNING and CTE in PostgreSQL:
demo=# select * from t1;
 id | name  
  1 | test1
  2 | test2
  3 | test3
  4 | test1
  5 | test2
  6 | test3
(6 rows)

demo=# WITH del_res AS (DELETE FROM t1 RETURNING id) 
demo-# SELECT CONCAT('Removed ID ',id) info FROM del_res;
 Removed ID 1
 Removed ID 2
 Removed ID 3
 Removed ID 4
 Removed ID 5
 Removed ID 6
(6 rows)


So my conclusion: Returning a resultset for DELETE is helpfull, and is one step in the direction of CTE support.

The next step step is to get the RETURNING keyword to work for UPDATE.