When you have to drop a large database, you'll encounter some problems, mainly replication lag. Now I'll show you how to avoid this.

What can cause replication lag when you drop a database? First, it takes some disk I/O to unlink the files, and secondly MySQL will scan through the buffer pool to see if there are pages from that database or not. On a huge (or at least big) database this could take seconds or even minutes, what means your slaves will collect lag for seconds or (of course) even minutes.

Every database dropping should be started with a backup of that database. I did it on a slave.

[root@db-secondary.bfc /home/banyek/chomp]# mysqldump --set-gtid-purged=OFF --single-transaction chomp > chomp.sql [root@db-secondary.bfc /home/banyek/chomp]# ls chomp.sql [root@db-secondary.bfc /home/banyek/chomp]# ls -alh total 823M drwxr-xr-x 2 root root 4.0K Mar 24 05:31 . drwx------ 4 banyek banyek 4.0K Mar 24 05:31 .. -rw-r--r-- 1 root root 823M Mar 24 05:32 chomp.sql [root@db-secondary.bfc /home/banyek/chomp]# gzip chomp.sql [root@db-secondary.bfc /home/banyek/chomp]# ls -lah total 202M drwxr-xr-x 2 root root 4.0K Mar 24 05:33 . drwx------ 4 banyek banyek 4.0K Mar 24 05:31 .. -rw-r--r-- 1 root root 202M Mar 24 05:32 chomp.sql.gz

So, your database is dumped, you can now safely remove it.

The following steps should be happen:

  • Clean up data from tables
  • Shrink the datafiles
  • Drop tables
  • Drop database

Clean up data from tables

First, check what to deal with. How many tables we have, how big they are, etc.

[root@db-master.bfc /home/banyek]# ls -lh /var/lib/mysql/chomp/ total 2.8G -rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 apikeys.frm -rw-r----- 1 mysql mysql 1.0M Jan 21 00:01 apikeys.ibd -rw-r----- 1 mysql mysql 65 Jan 21 00:25 db.opt -rw-r----- 1 mysql mysql 13K Jan 21 00:25 images.frm -rw-r----- 1 mysql mysql 2.8G Mar 4 14:22 images.ibd -rw-r----- 1 mysql mysql 8.4K Jan 21 00:25 keys.frm -rw-r----- 1 mysql mysql 1.0M Jan 21 00:01 keys.ibd -rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 transform_lock.frm -rw-r----- 1 mysql mysql 8.0M Jan 21 00:01 transform_lock.ibd -rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 videos.frm -rw-r----- 1 mysql mysql 9.0M Mar 4 13:14 videos.ibd Mon Mar 24 05:34:26 2014root@localhost [chomp]> select count(*) from images; +----------+ | count(*) | +----------+ | 8843673 | +----------+ 1 row in set (7.57 sec) Mon Mar 24 06:31:58 2014root@localhost [chomp]> select count(*) from apikeys -> ; +----------+ | count(*) | +----------+ | 3 | +----------+ 1 row in set (0.00 sec) Mon Mar 24 06:32:16 2014root@localhost [chomp]> select count(*) from `keys`; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec) Mon Mar 24 06:32:30 2014root@localhost [chomp]> select count(*) from transform_lock; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) Mon Mar 24 06:32:42 2014root@localhost [chomp]> select count(*) from videos; +----------+ | count(*) | +----------+ | 8570 | +----------+ 1 row in set (0.02 sec)

You can see we have only one really big table, and that is 'images' with about 9 million rows. The 'videos' table is the second biggest, it is 1/1000th of the images.

So far, so good.

I use the pt-archiver tool to clean up tables, because it chunks up the data what needed to delete. That means it will run the delete command with a limit clause where 'limit' equals the chunksize. I have calculated earlier that our database can delete about 2000 row in a chunk with no replication lag, so I'll use the same chunksize for cleaning up imges.

Advertisement

Before I start the delete, I check how many chunks I have - just to help myself keep tracking the state of delete.

Mon Mar 24 05:39:23 2014root@localhost [chomp]> select 8843673 / 2000 as chunks; +-----------+ | chunks | +-----------+ | 4421.8365 | +-----------+ 1 row in set (0.00 sec)

Here is the sniplet I use to track the state of delete:

# SELECT CONCAT(100 - ROUND(($CHUNKS - (($MAXITEMS - COUNT(*)) / $CHUNKSIZE / $CHUNKS * 100),'%' as percent_done FROM $TABLE_TO_CLEAN_UP # # So, wrapped into a while loop, in the current way it will be like: while true; do clear ; mysql chomp -e "select concat(100 - round((4421 - ((8843673 - count(*)) /2000)) / 4421 * 100),'%') as percent_done from images;" ; sleep 30; done # It will produce a neat table like this - refreshed in every 30 seonds. # +--------------+ # | percent_done | # +--------------+ # | 61% | # +--------------+

After the progressbar is set up, we can start the table cleanup job with pt-archiver

[root@db-master.bfc /home/banyek]# pt-archiver --statistics --no-check-charset --limit 2000 --txn-size 2000 --source D=chomp,t=images,u=user,p=password --purge --where "1=1" Source: D=chomp,p=...,t=images,u=user SELECT 8420052 INSERT 0 DELETE 8420052 Action Count Time Pct deleting 8420052 1981.1935 72.48 select 4212 72.4313 2.65 commit 4211 9.3239 0.34 other 0 670.3669 24.53 # You can see the 'Count' row is a bit less than it was at the select count(*), but this is just because # I've ran the archiver two times. # # # After the cleanup is done at the table, repeat this with the others too.

Shrink the datafiles

After you have cleaned up the data from the tables themselves you can see that the datafiles are still there. (You can read more about this here) So the next step is to shrink the datafiles. The easiest way to achieve this to run a simple alter like 'ALTER TABLE tablename engine=InnoDB' but we are still aware of replication lag, so run it via pt-online-schema-change

root@db-master.bfc /var/log/mysql]# pt-online-schema-change --execute --alter "engine=InnoDB;" h=localhost,u=user,p=password,D=chomp,t=images Found 31 slaves: [...] Altering new table... Altered `chomp`.`_images_new` OK. 2014-03-24T06:38:39 Creating triggers... 2014-03-24T06:38:39 Created triggers OK. 2014-03-24T06:38:39 Copying approximately 1 rows... 2014-03-24T06:38:39 Copied rows OK. 2014-03-24T06:38:39 Swapping tables... 2014-03-24T06:38:39 Swapped original and new tables OK. 2014-03-24T06:38:39 Dropping old table... 2014-03-24T06:38:40 Dropped old table `chomp`.`_images_old` OK. 2014-03-24T06:38:40 Dropping triggers... 2014-03-24T06:38:40 Dropped triggers OK. Successfully altered `chomp`.`images`. [root@db-master.bfc /var/lib/mysql/chomp]# ls -lah total 11M drwx------ 2 mysql mysql 4.0K Mar 24 06:38 . drwxr-xr-x 16 mysql mysql 4.0K Mar 4 04:17 .. -rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 apikeys.frm -rw-r----- 1 mysql mysql 1.0M Jan 21 00:01 apikeys.ibd -rw-r----- 1 mysql mysql 65 Jan 21 00:25 db.opt -rw-rw---- 1 mysql mysql 13K Mar 24 06:38 images.frm -rw-rw---- 1 mysql mysql 128K Mar 24 06:38 images.ibd -rw-r----- 1 mysql mysql 8.4K Jan 21 00:25 keys.frm -rw-r----- 1 mysql mysql 1.0M Jan 21 00:01 keys.ibd -rw-r----- 1 mysql mysql 8.5K Jan 21 00:25 transform_lock.frm -rw-r----- 1 mysql mysql 8.0M Jan 21 00:01 transform_lock.ibd -rw-rw---- 1 mysql mysql 8.5K Mar 24 06:37 videos.frm -rw-rw---- 1 mysql mysql 96K Mar 24 06:37 videos.ibd

Drop tables

Now we can drop the tables. YAY!

Only we have to be aware of two things:

  1. use IF EXISTS clause, because if a table filtered out from replication it will break it.
  2. be aware of reserved words. (Check bottom: the 'keys' table can cause problems, because the word is reserved. Just put backtick surround this. (And don't let the developers create tables with reserved words :D)

Mon Mar 24 06:42:26 2014root@localhost [chomp]> drop table if exists apikeys; Query OK, 0 rows affected (0.01 sec) Mon Mar 24 06:43:10 2014root@localhost [chomp]> drop table if exists images; Query OK, 0 rows affected (0.00 sec) Mon Mar 24 06:43:49 2014root@localhost [chomp]> drop table if exists `keys`; Query OK, 0 rows affected (0.00 sec) Mon Mar 24 06:43:55 2014root@localhost [chomp]> drop table if exists transform_lock; Query OK, 0 rows affected (0.00 sec) Mon Mar 24 06:44:06 2014root@localhost [chomp]> drop table if exists videos; Query OK, 0 rows affected (0.00 sec) Mon Mar 24 06:44:14 2014root@localhost [chomp]> show tables; Empty set (0.00 sec)

Drop database

Finally we are reached this point. We can now drop the database, and we're done (please don't forget the IF EXISTS clause!)

Mon Mar 24 06:44:17 2014root@localhost [chomp]> use mysql Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed Mon Mar 24 06:44:20 2014root@localhost [mysql]> drop database if exists chomp; Query OK, 0 rows affected (0.04 sec)

Conclusion

Dropping a database is really an easy thing, just one command and everything done, but if your site has a lot of traffic, and your users didn't tolerate weird things, you have to be cautious. In this scenario a simple 'drop database' had to be took about a few seconds, with a few seconds lag, but building up good habits are mandatory - so I recommend to always clean up database before dropping it.