Upgrading MySQL is easy...

Upgrading mysql to a newer version is a very simple thing: just replace the binaries, and run:

# mysql_upgrade

watch the error log after restarting the service to check if any configuration parameter have to change because of renaming, deprecating etc. and after all it could be done in a few minutes.

...but not as easy as it seems at the first sight.

But this is not as easy if you want to be sure, that the upgrade won’t break anything. Well, on a big site like Kinja, I can ensure you: you want to check out all of the possible bad things which could happen.

So, you have to check out three different aspects of the upgraded system:

* The slave data consistency

* The query result consistency

* The systems overall performance

The upgrade process

Slave data consistency

The first check which have be done to ensure if the replication data won’t differ on the newer version slaves than the old ones.

Advertisement

For this, the main idea is to set up a replication chain between an old slave, and a new one. If your current replication schema looks like this:

which means all the replication slaves attached to a single master, then you have to move one of the slaves to replicate from an another slave (and promote that to a master). The new replication setup will something like this:

Promoting a slave as a replication master needs to be enable the binary logging on the slave (if it is not switched on before), and issue on the new slave a ‘CHANGE MASTER TO ...’ command. If you don’t want to build the new replication slave from it’s new master (for example if your database is huge, and you don’t want to wait a few hours until the dump/restore process finishes) you can make it as they set to the same replication state before the replication pointer will be changed.

Advertisement

First, you should stop the replication io_thread on both of the machines, what will stop getting new replication data from master, wait until the last log will be processed, check which slave is a bit further than the other, and then move the back one to the new position.

mysql> stop slave io_thread

[... waiting, checking slave state with 'show slave status' ...]

mysql> stop slave sql_thread

mysql> show slave status\G

*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.2 Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.004707 Read_Master_Log_Pos: 167675726 Relay_Log_File: db-secondary-relay-bin.011456 Relay_Log_Pos: 95992979 Relay_Master_Log_File: mysql-bin.004707 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 167675726 Relay_Log_Space: 167676138 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 1 row in set (0.00 sec)

check the ‘Exec_Master_Log_Pos’ variable on both of the slaves, and where it is smaller, you can start the slave until the larger position. For our example, if the above slave status shows the slave position which went further, on the other host you should issue a

mysql> START SLAVE UNTIL MASTER_LOG_FILE='mysql-bin.004707', MASTER_LOG_POS=167675726;

command, what will forward your slave to the exact position of the others.

Point this slave to it’s new master

mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.004707', MASTER_LOG_POS=167675726, MASTER_HOST='192.168.1.2', MASTER_USER='replication', MASTER_PASSWORD='iwonttellyousorry:)'; mysql> START SLAVE

and start the other slave as well.

At this very moment you can check if the replication is working between the two nodes.

If everything is fine, you can stop the newly promoted masters replication thread, with a ‘STOP SLAVE’, stop mysqld on the second machine, upgrade the mysql binaries, and run the mysql_upgrade command.

You can now start the two slaves again.

So, we have a new version slave replicating from the old version master, we can let them run for a while, even for days if you have time for that. After you have spent the desired amount of time, you should stop the replication thread on the newly promoted master.

Advertisement

And now, we can check out if there are any differences between the data of the both hosts.

For this, we can use the pt-table-checksum utility from Percona which will checksumming all the tables, and points out if there are any differences.

To use this tool we have to know only one “hidden” thing: it will split the tables for chunks and checksumming the chunks themselves to compare them. It sounds great, but this is only absolvable when there are real, PK-s on all the tables, so if you don’t have PKs on all you tables feel free to run the corresponding ‘ALTER TABLE’ commands.

Advertisement

Now, you can run the pt-table-checksum on the new replication master, to see, if the data differs. (I use the —nocheck-replication-filters option, because we have some filters set up, this could cause problems on a live system, but because we are isolated now, it is safe).

This operation could take a very long time, but after all when there are no ERRORS, or DIFFS you’ll know if the data changes or not.

So, step one completed, you have ensured if the replication will won’t break.

Query result consistency

The next step we have to perform a check, if the same queries hit the servers will lead to the same results. To achieve this goal, you must have a slow-query log with valid SQL data in. To create such a slow-query log you have to set it up like this:

mysql> set global slow_query_log_use_global_control="log_slow_filter,log_slow_rate_limit,log_slow_verbosity,long_query_time";mysql> set global log_slow_verbosity="full"mysql> set global long_query_time=0;mysql> set global slow_query_log=on;mysql> flush slow logs;"

I was made this data collection at the time when the slave replication was tested, but you can choose any other time as well. You only have one more thing to do: when you start collecting the data, you have to create a backup of the current data.You can use mysqldump or innobackupex (xtrabackup). I prefer to use innobackupex.

Advertisement

So, your queries are collected after a few hours on the running system, there is time to check them against the database in a more controlled way.

Stop both the slaves, and restore them from the backup you have created at the beginning of the log collection.

Don’t set up replication, in this test both servers will act as same. Except of course one of them have the newer version of MySQL. So, upgrade one of them again.

Advertisement

In this test, we’ll use the pt-upgrade tool, it will run all the queries against both of the databases, and compare their query results.

To achieve this, we only have to do one more step, we just cut out the queries themselves from the mysql-slow.log. I mean, of course we could use the WHOLE slow log for this step, but it is enough to run every same fingerprint queries only 100 times each; if there are any differences they will be shown up.

So, cut the queries from the slow.log file like this:

# pt-query-digest --output=slowlog --sample 100 --no-report mysql-slow-slave.log > mysql-slow-slave.log.sampled<br>

now we can run the pt-upgrade tool, and compare the results:

pt-upgrade h=192.168.1.3,u=someuser,p=somepass h=192.168.1.4,u=someuser,p=somepass mysql-slow-slave.log.sampled > upgrade_report.out

It will take time, but after the test is completed you’ll have the upgrade_report.out file which will looks like this (for all the different fingerprinted queries):

# Query 4: ID 0x2FB6D3F8A27E3BFC at byte 0 _______________________________# host1: host1.xyz:3306# host2: host2.xyz:3306# Found 96 differences in 100 samples:# checksums 0# column counts 0# column types 0# query times 96# row counts 0# warning counts 0# warning levels 0# warnings 0# host1 host2# Errors 0 0# Warnings 0 0# Query_time # sum 79ms 284ms# min 527us 2ms# max 2ms 5ms# avg 788us 3ms# pct_95 925us 3ms# stddev 109us 506us# median 761us 3ms# row_count# sum 0 0# min 0 0# max 0 0# avg 0 0# pct_95 0 0# stddev 0 0# median 0 0<br>

As you see, there weren’t any differences in the queries (see ‘checksum’ first!) except query times. (Strange, when we’ve put the first 5.6 slave to production, it was faster than the old 5.5’s, but here, it was looked like it is 3 times slower. I think, this is because the pt-upgrade runs queries on a single execution thread, and the 5.6 improvements come on multiple cores/multiple threads way.)

Advertisement

Anyway, we have the proof, that the same query results will be shown up on the different servers.

Overall performance

The last check we have to made is about the system performance. Will it consume more or less CPU? Will it use more or less RAM? So many interesting questions ...

Advertisement

So, restore the previously made backup, ensure that one of the hosts upgraded, and the slow-query.log file is near.

At this step, we have to use the percona-playback tool (this is not part of the Percona Toolkit, you can yum or apt-get it).

Before we start using the query logs, we have split them up by schemas.

This tools will cut up your slow logs by schema: (There are scripts with created by Peter Boros, and not me!)

After it done, you can run the percona-playback with this files (note: if you want to have “more real” results, start the playbacks parallel.

percona-playback --mysql-schema ${schema} --mysql-username someuser --mysql-password somepass --mysql-host hostname --query-log-file ${query_log_file} --dispatcher-plugin thread-pool --thread-pool-threads-count=#(somenum) > $playback.out 2>&1

While the playback lasts, you can check iostat/vmstat/mpstast/your graphs/monitoring /mysqladmin/etc. (Ok, for this I have to confess that we had run it from a wrapper, what collects all the data.)

Advertisement

When the playback finished (on both hosts) you can check again the iostat/vmstat/mpstast/your graphs/monitoring /mysqladmin/etc. to compare the differences between the hosts.

If everything seems okay, you can know, if the new version will kill your machines overall performance, or not.

So, achievement unlocked: MySQL upgrade test.

Finally

If you are working on a high-traffic, high availability website, then you have to be damn sure every time when you change a system component, that there will be nothing broken. For me it is worth to spend a few days for testing-before-releasing than spending time with oncall, just trying to keep the fire away from the running systems.

Advertisement

Try to check out all the available tools what can be used (I use the Percona-Toolkit, but Oracle has nice stuff too.) because the usability of an RDBMS is not only depends on the software itself, but on their tools too.

(Note: this blog post was made about the upgrading from 5.5 to 5.6, so the variable names are set to this. On different versions, the variables could change (RTFM), but the steps of this process will remain the same.)