Dealing with auto-VACUUM in PostgreSQL

Dealing with auto-VACUUM in PostgreSQL

There’s no doubt that we need VACUUM in any proper PostgreSQL database environment. Without it, your database will simply grind to a halt and stop working at some point. That is the reason why auto-VACUUM is enabled by default. There is some protection mechanism (autovacuum_freeze_max_age) that forces a auto-VACUUM run to prevent your favourite database from shutting down before it has less than a million transaction IDs available.

However, if you have a very high INSERT/UPDATE OLTP workload database running, you need to configure auto-VACUUM correctly. The default settings of auto-VACUUM are too low. CitusDB has a very good article on how to tune it correctly.

Screen Shot 2018-05-08 at 13.00.38

From CitusDB article

If all your INSERT/UPDATE transactions are hitting a single table, there’s not a lot of things that you can do. You can tune auto-VACUUM to be running all the time. However, if it doesn’t keep up with the workload, your database WILL shutdown at some point.

Simply because a single VACUUM process can only work on a single table at one time. VACUUM isn’t parallelised against a single table. So what else could you do? You can use pg_repack or pgcompacttable to work around this issue. Data Egret has an excellent article on this.

In my opinion, all these are ugly workarounds. You will need to have a cronjob running somewhere that runs this job on a periodic basis.

The best way to do this, at least in PostgreSQL 10, is to partition the table. Essentially, PostgreSQL sees a partitioned table as multiple tables. This means that multiple VACUUM process can work on the “same” table. However, you need to choose your partitioning strategy carefully. If you partition the table based on TIMESTAMP, you are back to square one, since you are only targeting a single partition at one time. The idea would be to partition the table based on a key that will spread against multiple partitions.

Obviously, you will lose benefits like partition pruning and archival. But hey, life is about making choices. So you got to test both approach and see if the benefit is worth it. Hopefully, I can test and get some benchmark numbers on the speed of VACUUM on a normal table and a partitioned table.

Regards,
Wei Shan

Reference:

Leave a comment