Skip to content

Making large tables smaller: Compression in MySQL

So JF has been busy abusing MySQL again:

  • An Adventure in InnoDB Table Compression (for read-only tables) is trying out different KEY_BLOCK_SIZES in InnoDB to find out which settings yield the best compression for a specific table.

    His sample script copies a table and compresses it with one setting, then does this again, and if the new version is smaller, keeps the new one. Otherwise the first version of the table is being kept. This is then done over and over until a minimum sized InnoDB compressed table has been created. JF managed to create a compression ratio of 4.53, bringing a 20 TB instance down to 4.5TB.

  • In Why we still need MyISAM (for read-only tables) he does the same thing with his database in MyISAM format, and then compresses using myisampack, which is ok because his data is read-only archive data.

    MyISAM uncompressed is 22% smaller than InnoDB uncompressed. Compressed, his data is 10x smaller than the raw InnoDB uncompressed, so his 20TB raw data is below 2T compressed.

Using MyISAM for read-only data is much less critical than it would be for data that is being written to: Data corruption due to the lack of checksums is much less likely, and while the lack of clustered indexes can not really be compensated, “myisamchk –sort-index” is at least keeping the regenerated indexes linear in the MYI files.

Published inMySQL

One Comment

  1. Patrick Schaaf

    As usual compression ratios will strongly depend on the nature of the data.

    I’ve been doing that in production for 3 years now, with myisampack, for a set of daily “snapshotted / updated” tables, around 280 GB original size at the moment, with a compression ratio of 1:2 – the snapshots are done on one replica, compressed locally, and then distributed to a dozen production replicas with a precomputed “future activation date”, using a homegrown ssh-command setup. Final activation on the target replicas is then almost at the same time everywhere, using that future date, and RENAME TABLE statements at the right minute. The files on the target replicas reside in a tmpfs, and on boot they fetch from each other and the (persistent) original source.

    Works like a charm 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *