Kinja OPS

Gawker ops team

Speeding up compressing with parallel computing

Creating backups and compressing files are always a time consuming task, for example to create the daily backup on the kinja related databases was took about 6,5 hours every day. The first part of creating the backup itself is about an 40 minutes long task - that's the time of the run of innobackupex, and applying the changed logfiles to db. (I'll write about this later!) The second part of the backup is to compress files before they will be copied to the storage server...and this step was took about 5,5 hours. I have written this part of the backup script with the old fashioned compression utility - gzip (and tar of course!)

# Normally this is the way, how you compress a tarfile on the fly:  tar -czf backup.tar.gz /path/to/backupdir

This is totally awesome in the most of the times, but you have to know one thing: the gzip only uses one processor core during the operation, so if you have a beefy hardware, you can't even scratch the total throughput what your machine could done.

So, the solution is to paralellize all the operation.

Here is a good comparison of parallel compressing software.

I've decided to use pigz, so I've modified the regarding part of the backup script like this:

tar -c --use-compress-program=pigz -f backup.tar.gz /path/to/backupdir 

The result could be read from this article's head image: the compression itself completes in a half of an hour (this is 11 times faster than before!)

So my rebuild part of the developer's database could complete before the devs starting to use it. (hm... I think this will be a new post either.)

Share This Story

Get our newsletter