PostgreSQL and Disaster Recovery Strategy – part 2

In the second part of this series, we will consider the different ways you can take backup and how they can fit into a disaster recovery strategy. There are three ways you can take backups:

  1. Logical backups (pg_dump and pg_dumpall)
  2. Physical backups (pg_basebackup, wal-e, pgbackrest)
  3. Archiving (copying wal files before they are deleted)

Let’s consider them one by one. In this blog post, we will consider Logical Backups.

Logical backups

Logical backups are SQL files with instructions to rebuild database objects and data.

Logical backups are taken either by the pg_dump or pg_dumpall command. These commands will produce files with SQL commands to rebuild the database objects and repopulate the data. With pg_dump, you can backup specific databases, and pg_dumpall can backup everything in the server, including roles and tablespaces. If you go to the PostgreSQL documentation, you can see how to use these commands in your version of PostgreSQL. There are tricks to compress and accelerate the backup, but it will never be as fast as taking physical backups. These backups can be handy to backup and restore between major versions of PostgreSQL. Also, you can quickly restore specific objects within the database instead of restoring either the database or everything in the cluster. Be careful, though, in big and busy databases, using logical backups can degraded performance significantly. The performance degradation could be because of the increased I/O load and the long-running transactions it creates. Long-running transactions will keep autovacuum from cleaning up dead tuples. In these cases, it is better to only backup specific objects within the database using logical backups and set physical backups for the cluster. In the next section, we will discuss examples of how to use pg_dump and pg_dumpall.

pg_dump, backup only what you need, pg_dumpall backup cluster level objects

The command pg_dump will allow you to backup a single database within the cluster. It is essential to understand that pg_dump backs up either a single database or objects within a database. Cluster level objects, like roles, are not included in the backup from pg_dump. To backup cluster level objects, you should use pg_dumpall. Scripts created by pg_dump are SQL commands to recreate data and schema in plain text. These commands can be output to an SQL file and restored using the command psql. Here is an example of backing up:

pg_dump -h -U postgres -d pgvirtuoso -f pgvirtuoso.sql

Here is an example of restoring:

psql  -h -U postgres -d pgvirtuoso < pgvirtuoso.sql

You can also choose to backup selected tables or database objects. Also, you can only backup schema and restore it in another cluster or another database within the same cluster. The command pg_dump allows you to backup in a custom format. Backing up in custom format can be useful for some cases, like when you are backing up a database that uses TimescaleDB plugin. If you take a look at TimescaleDB documentation, you need to use a custom format to backup databases with hypertables. For more information take a look at PostgreSQL Documentation about pg_dump.

The command pg_dumpall will allow you to backup all databases in a PostgreSQL cluster. You don’t get to hand pick which database you want to backup with this command. But you can choose to backup everything, only the data, global objects only among other options. Here is an example to backup all:

pg_dumpall -h -U postgres -f pgvirtuosocluster.sql

Here is an example of backing up only the schema:

 pg_dumpall -s -h -U postgres -f pgvirtuosocluster.sql 

Here is another example to backup global objects only:

 pg_dumpall -g -h -U postgres -f pgvirtuosocluster.sql

Logical backups are useful to have as they can be restored between major versions of PostgreSQL. Meaning, you can backup a database in version 13 and restore the database in another cluster with version 14. Also, you can backup and restore global objects like roles and tablespaces. The disadvantages are that they are slower to backup and restore and that can increase I/O load in the cluster. Have to be very careful when you do a pg_dump on a busy server.

For disaster recovery scenarios, it is more appropriate to take physical backups combined with wal archiving for faster restores and point-in-time recovery. In the next part of this series, we will talk about physical backups.

Leave a Reply

%d bloggers like this: