Per-Column Compression for Efficient Storage Management
Battling storage constraints due to ever-growing text data in your MySQL tables? Percona MySQL's column compression feature comes to the rescue!
Recently, One of our client reached our Remote DBA team with a requirement to reduce the size of the table as it is having many text columns with huge number of records. At preliminary check , I have validated the table size and its row format, as it was in compressed format already.
Later I checked on other possibilities to compress the text columns further, At that time, then I came across per-column compression feature in Percona MySQL server (From 5.7.17-11) which features individual column compression and we were using Percona XtraDB cluster servers in that environment. We explored this feature and got major improvements in size reduction.
In this blog post, we will look on this feature and how to effectively use it. To test the same, I have installed Percona server for MySQL 5.7 in a machine with 4 cores of CPU and 8GB of RAM. I have used this tool mysql_random_data_load to load random data .Now let us walk through this feature.
Per-Column compression is a feature which helps us to store the data of columns in compressed format. The data will be compressed while writing on disk and decompressed while reading. It is using zlib library for compression.
It provides a better compression ratio with text data having large number of predefined words by using dictionaries. We need to select right column for compression which can provide good compression ratio else compression will be an additional overhead.
Here is a simple example.
In the first example we observed that compressed data size is more than the original because of the compression header and small amount of data with less repetitive patterns. So, we should be more careful in selecting the column for compression.
I have created a test database named “compression” and table with below structure .
I have created three tables with 2M records and the size of the tables are around 9GB
Ok, now we will start compressing the tables by using different methods and to analyse the best one.
- row_format=compressed
We will change the row format to compressed in table compress_test1.
Now the size of the table is reduced by 60%.
- Now we can implement column compression in test_records2 table. Column compression supports Blob (tiny,medium,long), text(tiny,medium,long), varchar,varbinary, json data types.
Great, now the size of the table is about 1.4 GB and it is reduced by around 85% from its original size.
Is it possible to reduce it further?
Yes, it is. We can achieve further reduction in size by using column dictionary.
Column dictionary is a method that contains predefined set of expected patterns, will help us to achieve better compression ratio.
- I have stored JSON values in that column. As it is a JSON, it is in the format of key/value pair. Key is always constant while value vary. So key is repetitive. I’m going to create dictionary based on the key. If you are storing some other data in a column, use most repetitive things to create the column dictionary. Otherwise dictionary won’t increase compression ratio.
Implementing column compression in test_records3 with created column dictionary test_dictionary.
Better, It is reduced by 200 MB further. We can achieve better compression ratio using column dictionary if we have a lot of repeated patterns. Otherwise, we can avoid it.
We can control the compression by using the system variables like innodb_compressed_columns_threshold and innodb_compressed_columns_zip_level.
Innodb_compressed_columns_zip_level(0-9):
This variable is used to specify the compression level used for compressed columns. Specifying 0 will use no compression, 1 the fastest and 9 the highest compression. Default value is 6. Specifying higher value will give better compression but the tradeoff is lower speed and increase in cpu utilization. The default value 6 is good for both compression and speed.
Innodb_compressed_columns_threshold(Bytes):
The value inserted will be compressed if the length exceeds this Innodb_compressed_columns_threshold value. The default value is 96. If it is less than that, the data will be stored in uncompressed format. This variable will be helpful to avoid unwanted compression of very small data.(As explained earlier, compression will be an overhead if the data is smaller)
Limitations:
1. We cannot create index on compressed columns
2. discard/import tablespace will not support for tables with compressed columns. To do that, we need to decompress prior to it.
3. Also, we need to mention enable-compressed-columns while taking backup using mysqldump, otherwise it will skip the compression statement. For dictionaries, enable-compressed-columns-with-dictionaries will be used.
Example:
1. Dump without compression options
2. Dump with compression option
3. Along with respective dictionaries
Rollback:
Decompression of a column can be done by changing the column format other than compressed (fixed/dynamic,default).
Also, whenever we are compressing/decompressing column, we are forced to use only copy algorithm in Alter command. Inplace won’t work as compression needs a complete rebuild.
Hope this blog helps you to know about column compression feature in MySQL. Happy compressing !!
Unlocking the full potential of your MySQL databases requires expert guidance. Mydbops, a leading provider of open-source database management services, offers a comprehensive suite of solutions to empower you. Get in touch with our MySQL specialists today to streamline your database management and optimize storage utilization!
{{cta}}
Featured image by Belinda Fewings on Unsplash