The introduction of support for incremental backup in pg_basebackup marks a significant advancement in PostgreSQL 17. This game-changing feature enhances data management and recovery capabilities by allowing users to efficiently back up only the changes made since the last backup, reducing both time and storage requirements.
As a heads up, pg_basebackup is a command line utility that allows taking base backup of a PostgreSQL server. A basebackup is a consistent snapshot of PostgreSQL server’s data directory.
Until PostgreSQL 16, pg_basebackup only supports full backup and to achieve incremental backups, we need to rely only on pgBackrest, Barman.
Commit had done by Robert Hass and below is the commit message.
Add support for incremental backup.
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup. You then use BASE_BACKUP with the INCREMENTAL option to take
the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.
An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.
Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
Now, let's explore how to effectively take full and incremental backups, as well as the steps to restore them properly.
Prerequisites
To take incremental backups, ensure following variables are enabled.
Note: summarize_wal cannot be enabled when wal_level set to minimal.
postgres=# show summarize_wal ;
summarize_wal
---------------
off
(1 row)
postgres=# alter system set summarize_wal to on;
ALTER SYSTEM
postgres=#
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# show summarize_wal ;
summarize_wal
---------------
on
(1 row)
Ensure to set this variable before initiating the full backup itself.
Taking Full Backup
Now that we've covered the prerequisites, let's move on to our first full backup. It's a fundamental step to remember that incremental backups cannot be taken without first completing a full backup.
Performing a few write operations on my sample database to provide detailed explanations.
postgres=# create table test(id int, name varchar, updated_at timestamptz);
CREATE TABLE
postgres=#
postgres=# insert into test values(1,'full backup',now());
INSERT 0 1
postgres=# select * from test;
id | name | updated_at
----+-------------+------------------------------
1 | full backup | 2024-02-07 03:13:41.35396+00
(1 row)
Now initiating the full backup.
postgres@mydbopslabs:~$ mkdir -p backups/full_backup/
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_basebackup -D backups/full_backup/
postgres@mydbopslabs:~$ du -sh backups/full_backup/
46M backups/full_backup/
postgres@mydbopslabs:~$
The full backup is completed, and the list of files in the backup directory is as follows:
postgres@mydbopslabs:~$ ls -lrth backups/full_backup/
total 304K
drwx------ 4 postgres postgres 4.0K Feb 7 03:15 pg_wal
-rw------- 1 postgres postgres 225 Feb 7 03:15 backup_label
-rw------- 1 postgres postgres 30K Feb 7 03:15 postgresql.conf
-rw------- 1 postgres postgres 109 Feb 7 03:15 postgresql.auto.conf
-rw------- 1 postgres postgres 3 Feb 7 03:15 PG_VERSION
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_twophase
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_subtrans
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_stat_tmp
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_stat
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_snapshots
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_serial
drwx------ 4 postgres postgres 4.0K Feb 7 03:15 pg_multixact
drwx------ 4 postgres postgres 4.0K Feb 7 03:15 pg_logical
-rw------- 1 postgres postgres 2.6K Feb 7 03:15 pg_ident.conf
-rw------- 1 postgres postgres 5.6K Feb 7 03:15 pg_hba.conf
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_dynshmem
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_commit_ts
drwx------ 6 postgres postgres 4.0K Feb 7 03:15 base
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_xact
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_tblspc
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_replslot
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 pg_notify
drwx------ 2 postgres postgres 4.0K Feb 7 03:15 global
-rw------- 1 postgres postgres 177K Feb 7 03:15 backup_manifest
The backup_manifest file contains the details of the backup and serves as input for the next incremental backup.
postgres@mydbopslabs:~$ tail -f backups/full_backup/backup_manifest
{ "Path": "base/16384/2600_vm", "Size": 8192, "Last-Modified": "2024-02-07 03:10:29 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "959acf54" },
{ "Path": "base/16384/4157", "Size": 0, "Last-Modified": "2024-02-07 03:05:14 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "00000000" },
{ "Path": "base/16384/4168", "Size": 8192, "Last-Modified": "2024-02-07 03:11:39 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "0be341bb" },
{ "Path": "pg_xact/0000", "Size": 8192, "Last-Modified": "2024-02-07 03:15:06 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "e7af7018" },
{ "Path": "global/pg_control", "Size": 8192, "Last-Modified": "2024-02-07 03:15:46 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "43872087" }
],
"WAL-Ranges": [
{ "Timeline": 1, "Start-LSN": "0/2000028", "End-LSN": "0/2000120" }
],
"Manifest-Checksum": "3d8d5db8bd14ee6c28e4254654d7775c99893fd8f92fbdd530e39363b7576117"}
Taking Incremental Backup
Now that we have successfully completed the full backup, we can proceed with the incremental backup.
postgres=# insert into test values(2,'incr backup1',now());
INSERT 0 1
postgres=# select * from test;
id | name | updated_at
----+--------------+-------------------------------
1 | full backup | 2024-02-07 03:13:41.35396+00
2 | incr backup1 | 2024-02-07 03:24:06.950052+00
(2 rows)
postgres@mydbopslabs:~$ mkdir -p backups/incr_backup1/
Option -i / --incremental needs to be used to take an incremental backup. Input to this option is the path of the backup_manifest file of the previous full/incremental backup.
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_basebackup --incremental=backups/full_backup/backup_manifest -D backups/incr_backup1/
postgres@mydbopslabs:~$ du -sh backups/incr_backup1/
24M backups/incr_backup1/
postgres@mydbopslabs:~$
Taking another incremental backup.
postgres=# insert into test values(3,'incr backup2',now());
INSERT 0 1
postgres=# select * from test;
id | name | updated_at
----+--------------+-------------------------------
1 | full backup | 2024-02-07 03:13:41.35396+00
2 | incr backup1 | 2024-02-07 03:24:06.950052+00
3 | incr backup2 | 2024-02-07 03:36:49.497018+00
(3 rows)
postgres@mydbopslabs:~$ mkdir -p backups/incr_backup2/
-- Referring to the previous incremental backup manifest file
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_basebackup --incremental=backups/incr_backup1/backup_manifest -D backups/incr_backup2/
postgres@mydbopslabs:~$ du -sh backups/incr_backup2/
24M backups/incr_backup2/
postgres@mydbopslabs:~$
Restoring Backups
It's very important to restore the backup taken without any difficulties. To make the restoration easier, a new CLI tool pg_combinebackup is introduced in PostgreSQL 17. It helps to reconstruct the full backup from incremental/dependent backups.
postgres@mydbopslabs:~$ mkdir -p restore/new_datadir/
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_combinebackup backups/full_backup/ backups/incr_backup1/ backups/incr_backup2/ -o restore/new_datadir/
postgres@mydbopslabs:~$ du -sh restore/new_datadir/
46M restore/new_datadir/
postgres@mydbopslabs:~$
Backups should be provided in the series of order, starting with the full backup, followed by incremental backup 1, incremental backup 2, and so on. The -o/--output option specifies the output directory where the backup will be restored.
Now, let's start the PostgreSQL server using the newly restored data directory.
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile stop
waiting for server to shut down.... done
server stopped
postgres@mydbopslabs:~$ chmod 700 /home/postgres/restore/new_datadir
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_ctl -D restore/new_datadir/ -l logfile start
waiting for server to start.... done
server started
postgres@mydbopslabs:~$
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/psql
psql (17devel)
Type "help" for help.
postgres=# select * from test;
id | name | updated_at
----+--------------+-------------------------------
1 | full backup | 2024-02-07 03:13:41.35396+00
2 | incr backup1 | 2024-02-07 03:24:06.950052+00
3 | incr backup2 | 2024-02-07 03:36:49.497018+00
(3 rows)
We have successfully restored all backups, ensuring that the data is available up to the moment of the last incremental backup.
Points to remember during restoration
- Order of the backup is important. Providing them in the wrong order will result in an error.
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_combinebackup backups/full_backup/ backups/incr_backup2/ backups/incr_backup1/ -o restore/new_datadir1/
pg_combinebackup: error: backup at "backups/incr_backup2/" starts at LSN 0/6000028, but expected 0/2000028
postgres@mydbopslabs:~$
- Missing an incremental backup or failing to start with a full backup will also result in an error.
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_combinebackup backups/full_backup/ backups/incr_backup2/ -o restore/new_datadir1/
pg_combinebackup: error: backup at "backups/full_backup/" starts at LSN 0/2000028, but expected 0/4000028
postgres@mydbopslabs:~$ /usr/local/pgsql/bin/pg_combinebackup backups/incr_backup2/ -o restore/new_datadir1/
pg_combinebackup: error: backup at "backups/incr_backup2/" is an incremental backup, but the first backup should be a full backup
postgres@mydbopslabs:~$
In conclusion, PostgreSQL continues to evolve with each new release, introducing features like incremental backup support in pg_basebackup that enhance its functionality and usability. In addition to the features discussed in this article, PostgreSQL 17 offers many other exciting capabilities waiting to be explored. Stay tuned for future blog posts where we'll delve deeper into these features, providing practical examples and tips for leveraging them effectively.
Looking to streamline your PostgreSQL backups and ensure data integrity? Explore our comprehensive database management solutions tailored to meet your organization's needs. Contact us today to learn more about our services and how we can help you harness the power of PostgreSQL for your business.
{{cta}}