Every MySQL DBA should deal with the situation, when there were an accidental write on one of the slaves. Changing replication to GTID will change the way how we should deal with that problem.

Let's check out!

So I assume you have two servers, a master and a slave.

root@master vagrant]# cat /etc/my.cnf [mysql] prompt = "master > " [mysqld] log-bin server-id = 1 root@slave vagrant]# cat /etc/my.cnf [mysql] prompt = "slave > " [mysqld] log-bin server-id = 2 [root@master vagrant]# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.15-56-log Percona Server (GPL), Release rel63.0, Revision 519 Copyright (c) 2009-2013 Percona LLC and/or its affiliates Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. master > SHOW MASTER STATUS\G *************************** 1. row *************************** File: master-bin.000001 Position: 120 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 1 row in set (0.00 sec) master > GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%' IDENTIFIED BY 'repl'; Query OK, 0 rows affected (0.00 sec) master > FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected, 1 warning (0.00 sec) slave > SHOW WARNINGS; +-------+------+------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+------------------------------------------------------------------------+ | Note | 1759 | Sending passwords in plain text without SSL/TLS is extremely insecure. | +-------+------+------------------------------------------------------------------------+ 1 row in set (0.00 sec) slave > SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.50.50 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 397 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 561 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 397 Relay_Log_Space: 734 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)

So we have set up a basic replication cluster with one master, and one slave server. We also got a warning about we should use SSL/TLS but that is an another story.

Advertisement

Let's populate the databases with some test data. First we create a table with an auto increment field, and inserting some data there.

master > USE test Database changed master > CREATE TABLE `testdata` ( -> `id` bigint(20) NOT NULL AUTO_INCREMENT, -> `text` varchar(45) NOT NULL, -> `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -> PRIMARY KEY (`id`) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.04 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 3 | +----------+ 1 row in set (0.00 sec) slave > USE test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 3 | +----------+ 1 row in set (0.00 sec)

OK, we have a table, we inserted some data into that, and it was replicated to the slave.

Advertisement

Let's break this. The most easy way to achieve this, is to insert some data directly on the slave, and insert data again on the master. What we expect is that the replication will break, because there will be a duplicate key on the slave servers table when it tries to replicate:

slave > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 6 | +----------+ 1 row in set (0.00 sec) slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 4 | +----------+ 1 row in set (0.00 sec) slave > SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.50.50 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 1751 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 1656 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1062 Last_Error: Error 'Duplicate entry '4' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')' Skip_Counter: 0 Exec_Master_Log_Pos: 1492 Relay_Log_Space: 2088 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry '4' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')' Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 140228 15:09:33 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)

Exactly! That was what we expected.

So let's fix this. There are several ways, you can rebuild the slave, you can delete the duplicate key on the slave, or you can order the slave to simply skip that binlog entry.

We will do this, with using the SLAVE_SKIP_COUNTER variable. This variable is used to skip certain logentries, for us this is 1 at the moment.

slave > STOP SLAVE; Query OK, 0 rows affected (0.00 sec) slave > SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; Query OK, 0 rows affected (0.00 sec) slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected (0.00 sec) slave > SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.50.50 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000001 Read_Master_Log_Pos: 2269 Relay_Log_File: slave-relay-bin.000003 Relay_Log_Pos: 284 Relay_Master_Log_File: master-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 2269 Relay_Log_Space: 2770 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec) slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 6 | +----------+ 1 row in set (0.00 sec)

YAY!

So far so good.

I just realised that this is not a good example because of the 'date' column,

`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

because the timestamps differs. But in fact I just write this to show the differences between the standard and GTID mode, so forgive me this. At this scenario I had to remove the row, and restart replication, but please pretend this isn't happened. I always just use this example table, so get over it please :).

Let's switch to GTID based replication!

[root@master vagrant]# vi /etc/my.cnf [root@master vagrant]# service mysql restart Shutting down MySQL (Percona Server).. SUCCESS! Starting MySQL (Percona Server). SUCCESS! [root@master vagrant]# cat /etc/my.cnf [mysql] prompt = "master > " [mysqld] log-bin server-id = 1 gtid-mode = ON log-slave-updates enforce-gitd-consistency [root@slave vagrant]# vi /etc/my.cnf [root@slave vagrant]# service mysql restart Shutting down MySQL (Percona Server).. SUCCESS! Starting MySQL (Percona Server). SUCCESS! [root@slave vagrant]# cat /etc/my.cnf [mysql] prompt = "slave > " [mysqld] log-bin server-id = 2 gtid-mode = ON log-slave-updates enforce-gitd-consistency slave > STOP SLAVE; Query OK, 0 rows affected (0.01 sec) slave > CHANGE MASTER TO MASTER_HOST='192.168.50.50', MASTER_AUTO_POSITION=1; Query OK, 0 rows affected (0.02 sec) slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected, 1 warning (0.01 sec) slave > SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.50.50 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 151 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 363 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 151 Relay_Log_Space: 567 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 1 1 row in set (0.00 sec) master > USE test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec) master > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 10 | +----------+ 1 row in set (0.00 sec) slave > use test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 10 | +----------+ 1 row in set (0.00 sec)

We can break again the replication, as in the past, and try to solve that in the same way:

slave > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.01 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > INSERT INTO testdata (text) VALUES ('test'); Query OK, 1 row affected (0.00 sec) master > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 12 | +----------+ 1 row in set (0.00 sec) slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 11 | +----------+ 1 row in set (0.00 sec)

Yeah, it was break again. Let's try to fix this in the same way as before.

slave > STOP SLAVE; Query OK, 0 rows affected (0.00 sec) slave > SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction

That's it! We can't solve this problem in the same way, but MySQL is kind to tell what have we do, we have to insert an empty transaction, with the same TID as the bad transaction, and when we restart the slave. First, check the slave status:

slave > SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.50.50 Master_User: Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 1993 Relay_Log_File: slave-relay-bin.000002 Relay_Log_Pos: 1591 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1062 Last_Error: Error 'Duplicate entry '11' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')' Skip_Counter: 0 Exec_Master_Log_Pos: 1379 Relay_Log_Space: 2409 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry '11' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testdata (text) VALUES ('test')' Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 59879f62-a087-11e3-99a4-080027079e3d Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 140228 15:52:47 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-6 Executed_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-4, 6d1bb29b-a087-11e3-99a4-080027079e3d:1 Auto_Position: 1 1 row in set (0.00 sec)

You can see that the last executed GTID set is 59879f62-a087-11e3-99a4-080027079e3d:1-4. The only thing we need to do is to insert an empty transaction with the duplicate one (59879f62-a087-11e3-99a4-080027079e3d:5 <- because this was what didn't run) and after that restart the replication. See how:

slave > SET GTID_NEXT="59879f62-a087-11e3-99a4-080027079e3d:5"; Query OK, 0 rows affected (0.00 sec) slave > BEGIN; Query OK, 0 rows affected (0.00 sec) slave > COMMIT; Query OK, 0 rows affected (0.00 sec) slave > SET GTID_NEXT="AUTOMATIC" -> ; Query OK, 0 rows affected (0.00 sec) slave > START SLAVE USER='repl' PASSWORD='repl'; Query OK, 0 rows affected, 1 warning (0.00 sec) slave > pager grep Gtid PAGER set to 'grep Gtid' slave > SHOW SLAVE STATUS\G Retrieved_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-7 Executed_Gtid_Set: 59879f62-a087-11e3-99a4-080027079e3d:1-7, 1 row in set (0.00 sec) slave > \P Default pager wasn't set, using stdout. slave > SELECT COUNT(*) FROM testdata; +----------+ | COUNT(*) | +----------+ | 12 | +----------+ 1 row in set (0.00 sec)

So the replication is fixed by inserting an empty transaction.