PostgreSQL and Disaster Recovery Strategy – part 1

Copyright: olivier26

Introduction

We all wish to have a magical button in our system that recovers everything in case of disaster. Sadly, it is not that easy. Many companies get stuck in doing backup strategies, but do they have a disaster recovery strategy? There is a vast difference between both. In a backup strategy you only plan on how to backup data, but in a disaster recovery strategy you design a plan to backup data and recover the application, not only the data.

Copyright: dizanna

The truth about backup strategies

For starters, you do backup, but do they work? I have a friend that worked with a company in the middle of a merger. He had to audit a department on the acquired company where they had a backup system that produced a tape every night. When my friend decided to validate their backups, he realized that the tapes were blank because of broken hardware for the tape recorder. Nobody tested the backups in a long time. How good are backups that don’t work?

Another thing to consider is Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is the maximum tolerable length of time that a system can be down after a disaster occurs. RPO is the maximum amount of data that can be lost in a business. It is important to understand that restoring a backup can take some time.

How does PostgreSQL fits in all this?

PostgreSQL is the most advanced open source relational database in the world and is at the center of data systems in many companies. Restoring a database to a point in time, is easy in PostgreSQL. But need to understand the inner workings in order to know what to expect. For example, restoring a 10GB database cluster can be done in minutes while a 2TB database cluster can take more than 12 hours. And if you take a weekly full backup for a 7TB cluster and have to replay 5 or 6 days of wal files, that database can take days to recover. If in a worst case scenario, your application will be down for days because of recovery, you have to rethink both your architecture and your disaster recovery. For example, instead of using wal-e, might want to use pgbackrest, which allow users to take incremental and differential backups, reducing the need of replaying wal files.

This article is the first in a series of articles that discuss how to plan for recovery for your PostgreSQL cluster in a way that suits your business needs. In the next article we will discuss different ways we can backup our database cluster.

Leave a Reply

%d bloggers like this: