published by (Greg Sabino Mullane) on 2017-03-28 16:23:00 in the "postgres" category

Turtle turtle by WO1 Larry Olson from US Army

The PostgreSQL database system uses the write-ahead logging method to ensure that a log of changes is saved before being applied to the actual data. The log files that are created are known as WAL (Write Ahead Log) files, and by default are 16 MB in size each. Although this is a small size, a busy system can generate hundreds or thousands of these files per hour, at which point disk space becomes an issue. Luckily, WAL files are extremely compressible. I examined different programs to find one that offered the best compression (as indicated by a smaller size) at the smallest cost (as indicated by wall clock time). All of the methods tested worked better than the venerable gzip program, which is suggested in the Postgres documentation for the archive_command option. The best overall solution was using the pxz program inside the archive_command setting, followed closely by use of the 7za program. Use of the built-in wal_compression option was an excellent solution as well, although not as space-saving as using external programs via archive_command.

A database system is a complex beast, involving many trade-offs. An important issue is speed: waiting for changes to get written to disk before letting the client proceed can be a very expensive solution. Postgres gets around this with the use of the Write Ahead Log, which generates WAL files indicating what changes were made. Creating these files is much faster than performing the actual updates on the underlying files. Thus, Postgres is able to tell the client that the work is "done" when the WAL file has been generated. Should the system crash before the actual changes are made, the WAL files are used to replay the changes. As these WAL files represent a continuous unbroken chain of all changes to the database, they can also be used for Point in Time Recovery - in other words, the WAL files can be used to rewind the database to any single point in time, capturing the state of the database at a specific moment.

Postgres WAL files are exactly 16 MB in size (although this size may be changed at compilation time, it is extremely unheard of to do this). These files primarily sit around taking up disk space and are only accessed when a problem occurs, so being able to compress them is a good one-time exchange of CPU effort for a lower file size. In theory, the time to decompress the files should also be considered, but testing revealed that all the programs decompressed so quickly that it should not be a factor.

WAL files can be compressed in one of two ways. As of Postgres 9.5, the wal_compression feature can be enabled, which instructs Postgres to compress parts of the WAL file in-place when possible, leading to the ability to store much more information per 16 MB WAL file, and thus reducing the total number generated. The second way is to compress with an external program via the free-form archive_command parameter. Here is the canonical example from the Postgres docs, showing use of the gzip program for archive_command:

archive_command = 'gzip < %p > /var/lib/pgsql/archive/%f'

It is widely known that gzip is no longer the best compression option for most tasks, so I endeavored to determine which program was the best at WAL file compression - in terms of final file size versus the overhead to create the file. I also wanted to examine how these fared versus the new wal_compression feature.

To compare the various compression methods, I examined all of the compression programs that are commonly available on a Linux system, are known to be stable, and which perform at least as good as the common utility gzip. The contenders were:

  • gzip - the canonical, default compression utility for many years
  • pigz - parallel version of gzip
  • bzip2 - second only to gzip in popularity, it offers better compression
  • lbzip2 - parallel version of bzip
  • xz - an excellent all-around compression alternative to gzip and bzip
  • pxz - parallel version of xz
  • 7za - excellent compression, but suffers from complex arguments
  • lrzip - compression program targeted at "large files"

For the tests, 100 random WAL files were copied from a busy production Postgres system. Each of those 100 files were compressed nine times by each of the programs above: from the "least compressed" option (e.g. -1) to the "best compressed" option (e.g. -9). The tests were performed on a 16-core system, with plenty of free RAM and nothing else running on the server. Results were gathered by wrapping each command with /usr/bin/time -verbose, which produces a nice breakdown of results. To gather the data, the "Elapsed (wall clock) time" was used, along with size of the compressed file. Here is some sample output of the time command:

  Command being timed: "bzip2 -4 ./0000000100008B91000000A5"
  User time (seconds): 1.65
  System time (seconds): 0.01
  Percent of CPU this job got: 99%
  Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.66
  Average shared text size (kbytes): 0
  Average unshared data size (kbytes): 0
  Average stack size (kbytes): 0
  Average total size (kbytes): 0
  Maximum resident set size (kbytes): 3612
  Average resident set size (kbytes): 0
  Major (requiring I/O) page faults: 0
  Minor (reclaiming a frame) page faults: 938
  Voluntary context switches: 1
  Involuntary context switches: 13
  Swaps: 0
  File system inputs: 0
  File system outputs: 6896
  Socket messages sent: 0
  Socket messages received: 0
  Signals delivered: 0
  Page size (bytes): 4096
  Exit status: 0

The wal_compression feature was tested by creating a new Postgres 9.6 cluster, then running the pgbench program twice to generate WAL files - once with wal_compression enabled, and once with it disabled. Then each of the resulting WAL files was compressed using each of the programs above.

Table 1.
Results of compressing 16 MB WAL files - average for 100 files
CommandWall clock time (s)File size (MB)
gzip -10.2714.927
gzip -20.2924.777
gzip -30.3664.667
gzip -40.3814.486
gzip -50.4924.318
gzip -60.7344.250
gzip -70.9914.235
gzip -82.0424.228
gzip -93.6264.227
CommandWall clock time (s)File size (MB)
bzip2 -11.5403.817
bzip2 -21.5313.688
bzip2 -31.5703.638
bzip2 -41.6253.592
bzip2 -51.6673.587
bzip2 -61.7073.566
bzip2 -71.7313.559
bzip2 -81.7523.557
bzip2 -91.7843.541
CommandWall clock time (s)File size (MB)
xz -10.9623.174
xz -21.1863.066
xz -35.9112.711
xz -46.2922.682
xz -56.6942.666
xz -68.9882.608
xz -79.1942.592
xz -89.1172.596
xz -99.1642.597
Table 2.
Results of compressing 16 MB WAL file - average for 100 files
CommandWall clock time (s)File size (MB)
-l -L1
-l -L2
-l -L3
-l -L4
-l -L5
-l -L6
-l -L7
-l -L8
-l -L9
CommandWall clock time (s)File size (MB)
-z -L1
-z -L2
-z -L3
-z -L4
-z -L5
-z -L6
-z -L7
-z -L8
-z -L9
CommandWall clock time (s)File size (MB)
7za -bd -mx=1
a test.7za
7za -bd -mx=2
a test.7za
7za -bd -mx=3
a test.7za
7za -bd -mx=4
a test.7za
7za -bd -mx=5
a test.7za
7za -bd -mx=6
a test.7za
7za -bd -mx=7
a test.7za
7za -bd -mx=8
a test.7za
7za -bd -mx=9
a test.7za
Table 3.
Results of compressing 16 MB WAL file - average for 100 files
CommandWall clock time (s)File size (MB)
pigz -10.0514.904
pigz -20.0514.755
pigz -30.0514.645
pigz -40.0514.472
pigz -50.0514.304
pigz -60.0604.255
pigz -70.0814.225
pigz -80.1404.212
pigz -90.2514.214
CommandWall clock time (s)File size (MB)
lbzip2 -10.1353.801
lbzip2 -20.1513.664
lbzip2 -30.1513.615
lbzip2 -40.1513.586
lbzip2 -50.1513.562
lbzip2 -60.1513.545
lbzip2 -70.1503.538
lbzip2 -80.1513.524
lbzip2 -90.1503.528
CommandWall clock time (s)File size (MB)
pxz -10.1353.266
pxz -20.1753.095
pxz -31.2442.746
pxz -42.5282.704
pxz -55.1152.679
pxz -69.1162.604
pxz -79.2552.599
pxz -89.2672.595
pxz -99.3552.591
Table 4.
Results of Postgres wal_compression option
ModificationsTotal size of WAL files (MB)
No modifications208.1
wal_compression enabled81.0
xz -28.6
wal_compression enabled PLUS xz -29.4

Table 1 shows some baseline compression values for the three popular programs gzip, bzip2, and xz. Both gzip and bzip2 show little change in the file sizes as the compression strength is raised. However, xz has a relatively large jump when going from -2 to -3, although the time cost increases to an unacceptable 5.9 seconds. As a starting point, something well under one second is desired.

Among those three, xz is the clear winner, shrinking the file to 3.07 MB with a compression argument of -2, and taking 1.2 seconds to do so. Both gzip and bzip2 never even reach this file size, even when using a -9 argument. For that matter, the best compression gzip can ever achieve is 4.23 MB, which the other programs can beat without breakng a sweat.

Table 2 demonstrates two ways of invoking the lrzip program: the -l option (lzo compression - described in the lrzip documentation as "ultra fast") and the -z option (zpaq compression - "extreme compression, extreme slow"). All of those superlatives are supported by the data. The -l option runs extremely fast: even at -L5 the total clock time is still only .39 seconds. Unfortunately, the file size hovers around an undesirable 5.5 MB, no matter what compression level is used. The -z option produces the smallest file of all the programs here (2.48 MB) at a whopping cost of over 30 seconds! Even the smallest compression level (-L1) takes 3.5 seconds to produce a 3.4 MB file. Thus, lrzip is clearly out of the competition.

Compression in action (photo by Eric Armstrong)

The most interesting program is without a doubt 7za. Unlike the others, it is organized around creating archives, and thus doesn't do in-place compression as the others do. Nevertheless, the results are quite good. At the lowest level, it takes a mere 0.13 seconds to produce a 3.18 MB file. As it takes xz 1.19 seconds to produce a nearly equivalent 3.10 MB file, 7za is the clear winner ... if we had only a single core available. :)

It is rare to find a modern server with a single processor, and a crop of compression programs have appeared to support this new paradigm. First up is the amazing pigz, which is a parallel version of gzip. As Table 3 shows, pigz is extraordinarily fast on our 16 core box, taking a mere 0.05 seconds to run at compression level -1, and only 0.25 seconds to run at compression level -9. Sadly, it still suffers from the fact that gzip is simply not good at compressing WAL files, as the smallest size was 4.23 MB. This rules out pigz from consideration.

The bzip2 program has been nipping at the heels of gzip for many years, so naturally it has its own parallel version, known as lbzip2. As Table 3 shows, it is also amazingly fast. Not as fast as pigz, but with a speed of under 0.2 seconds - even at the highest compression level! There is very little variation among the compression levels used, so it is fair to simply state that lbzip2 takes 0.15 seconds to shrink the WAL file to 3.5 MB. A decent entry.

Of course, the xz program has a parallel version as well, known as pxz. Table 3 shows that the times still vary quite a bit, and reach into the 9 second range at higher compression levels, but does very well at -2, taking a mere 0.17 seconds to produce a 3.09 MB file. This is comparable to the previous winner, 7za, which took 0.14 seconds to create a 3.12 MB file.

So the clear winners are 7za and pxz. I gave the edge to pxz, as (1) the file size was slightly smaller at comparable time costs, and (2) the odd syntax for 7za for both compressing and decompressing was annoying compared with the simplicity of "xz -2" and "xz -d".

Now, what about the built-in compression offered by the wal_compression option? As Table 4 shows, the compression for the WAL files I tested went from 208 MB to 81 MB. This is a significant gain, but only equates to compressing a single WAL file to 6.23 MB, which is a poor showing when compared to the compression programs above. It should be pointed out that the wal_compression option is sensitive to your workload, so you might see reports of greater and lesser compressions.

Of interest is that the WAL files generated by turning on wal_compression are capable of being further compressed by the archive_command option, and doing a pretty good job of it as well - going from 81 MB of WAL files to 9.4 MB of WAL files. However, using just xz in the archive_command without wal_compression on still yielded a smaller overall size, and means less CPU because the data is only compressed once.

It should be pointed out that wal_compression has other advantages, and that comparing it to archive_command is not a fair comparison, but this article was primarily about the best compression option for storing WAL files long-term.

Thus, the overall winner is "pxz -2", followed closely by 7za and its bulky arguments, with honorable mention given to wal_compression. Your particular requirements might guide you to another conclusion, but hopefully nobody shall simply default to using gzip anymore.

Thanks to my colleague Lele for encouraging me to try pxz, after I was already happy with xz. Thanks to the authors of xz, for providing an amazing program that has an incredible number of knobs for tweaking. And a final thanks to the authors of the wal_compression feature, which is a pretty nifty trick!