Simplify Your PostgreSQL Table Partitioning with pg_partman Extension

Aug 30, 2023

Mins to Read

All

Simplify PostgreSQL Table Partitioning with the pg_partman Extension

‍
In the realm of data management, PostgreSQL Partitioning emerges as a powerful solution. Imagine streamlining your data into organized segments, each known as a partition. Whether dealing with time-series data or custom intervals, PostgreSQL Partitioning optimizes organization.

The real magic? The partition key. It guides data to the right partition with finesse, like a GPS for your information. These partitions aren't isolated islands; they're connected, forming a coherent data landscape.

In the quest for data access, PostgreSQL plays a pivotal role. It retrieves only what's essential, enabling swift and precise queries. However, while PostgreSQL inherently offers partitioning features, the process of creation and management can prove intricate and time-intensive. Establishing new partitions and managing outdated ones often demands manual labor. But there's a solution to explore: pg_partman, an advanced PostgreSQL extension that elevates partition management to new heights.

Introducing pg_partman: Effortless PostgreSQL Partitioning

pg_partman is an extension designed to create and oversee sets of partitioned tables, whether based on time or serial numbers. The creation of child tables is fully managed by the extension itself. Moreover, there's an optional retention policy that can automatically discard partitions that are no longer necessary, To streamline partition maintenance, the extension includes a background worker (BGW) process, eliminating the need for an external scheduler like cron in many instances.

Installation of pg_partman

Note: Compatible with PostgreSQL version 10 or higher.

For this demonstration, we'll utilize PostgreSQL version 15.

Installation on EC2/VMs

To get started with pg_partman, follow these steps:

Clone the git repository:

 
mydbops@mydbops:~$ git clone https://github.com/pgpartman/pg_partman.git

 
Cloning into 'pg_partman'...
remote: Enumerating objects: 3454, done.
remote: Counting objects: 100% (501/501), done.
remote: Compressing objects: 100% (176/176), done.
remote: Total 3454 (delta 378), reused 403 (delta 316), pack-reused 2953
Receiving objects: 100% (3454/3454), 2.81 MiB | 1.79 MiB/s, done.
Resolving deltas: 100% (2446/2446), done.


mydbops@mydbops:~$ ls -lrth
total 223
drwxrwxr-x 9 mydbops mydbops  223 Aug 19 07:26 pg_partman

Navigate to the pg_partman directory and install the make file:

 
mydbops@mydbops:~/pg_partman$ ls -lrth
total 156K
-rw-rw-r--  1 mydbops mydbops 1.4K Aug 19 07:26 META.json
-rw-rw-r--  1 mydbops mydbops 1.2K Aug 19 07:26 LICENSE.txt
-rw-rw-r--  1 mydbops mydbops 116K Aug 19 07:26 CHANGELOG.txt
-rw-rw-r--  1 mydbops mydbops 8.1K Aug 19 07:26 README.md
-rw-rw-r--  1 mydbops mydbops 1.6K Aug 19 07:26 Makefile
drwxrwxr-x  4 mydbops mydbops   32 Aug 19 07:26 bin
-rw-rw-r--  1 mydbops mydbops  111 Aug 19 07:26 pg_partman.control
drwxrwxr-x  2 mydbops mydbops  193 Aug 19 07:26 doc
drwxrwxr-x  2 mydbops mydbops   30 Aug 19 07:26 src
drwxrwxr-x 10 mydbops mydbops 4.0K Aug 19 07:26 test
drwxrwxr-x  2 mydbops mydbops 4.0K Aug 19 07:26 updates
drwxrwxr-x  6 mydbops mydbops   97 Aug 19 07:29 sql

make install

mydbops@mydbops:~/pg_partman$ make install

Note: If you encounter issues related to the missing postgresql-server-dev package, you can resolve it by running: sudo apt install postgresql-server-dev-XX, where XX is the PostgreSQL version

If you prefer plain PL/PGSQL functions without the background worker, use this command instead:

 
mydbops@mydbops:~/pg_partman$ make NO_BGW=1 install

Add pg_partman_bgw to the shared_preload_libraries in the postgresql.conf file and restart the database instance:

 
# - Shared Library Preloading -

#local_preload_libraries = ''
#session_preload_libraries = ''
shared_preload_libraries = 'pg_partman_bgw'     # (change requires restart)

This change necessitates a database restart.

Verify the installation by checking the shared preload libraries:

 
postgres=# show shared_preload_libraries;
shared_preload_libraries 
--------------------------
pg_partman_bgw
(1 row)

For Cloud-hosted Machines

If you are using cloud-hosted machines, the required packages for installation will be available on the cloud server. Add the parameter pg_partman_bgw to the shared_preload_libraries and then restart the database instance.

Working with the demo_db Database:

In this demo, we'll work with a database named demo_db:

 
Name    |  Owner   | Encoding |   Collate   |    Ctype    | ICU Locale | Locale Provider |   Access privileges   |  Size   | Tablespace | Description   
-----------+----------+----------+-------------+-------------+------------+-----------------+-----------------------+---------+------------+---------------
 demo_db   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            |                       | 702 MB  | pg_default | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            |                       | 27 MB   | pg_default | default administrative connection database
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =c/postgres          +| 7281 kB | pg_default | unmodifiable empty database
           |          |          |             |             |            |                 | postgres=CTc/postgres |         |            | 
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |            | libc            | =c/postgres          +| 7509 kB | pg_default | default template for new databases
           |          |          |             |             |            |                 | postgres=CTc/postgres |         |            | 
(5 rows)

Connect to the demo_db database and create the pg_partman extension within a new schema named partman

 
postgres=# \c demo_db
You are now connected to database "demo_db" as user "postgres".
demo_db=# CREATE SCHEMA partman;
CREATE SCHEMA
demo_db=# CREATE EXTENSION pg_partman SCHEMA partman;
CREATE EXTENSION

To confirm the installation, list the installed extensions:

 
demo_db=# \dx
                               List of installed extensions
    Name    | Version |   Schema   |                     Description                      
------------+---------+------------+------------------------------------------------------
 pg_partman | 4.7.3   | partman    | Extension to manage partitioned tables by time or ID

We have successfully completed the installation process.

Use Cases: Exploring Table Partitioning with pg_partman

Creating Table Partitions using pg_partman:

Let's delve into creating table partitions using pg_partman with a real-world example. Suppose we want to create a partitioned table based on the range partitioning technique to organize data by time intervals. Here's how to achieve this:

First, we create a new partitioned table named table01 using SQL. This table will be partitioned based on the report_date column.

 
 
# Rotate log files daily

log_filename = 'postgresql-%d.log'

# Keeping log files for 30 days ( 1 month)

log_rotation_age = 30days 

#Truncating the old logs after 30 days : 

Log_truncate_on_rotation = ‘true’

Note: The column used as the partition key must be part of the primary key, and for range partitioning, it must be of timestamp type.

Checking the structure of the partitioned table01 , you can see that it has not yet been populated with child partitions:

 
demo_db=# \d+ table01 

                                     Partitioned table "public.table01"
   Column    |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id          | integer |           | not null |         | plain    |             |              | 
 name        | text    |           | not null |         | extended |             |              | 
 report_date | date    |           | not null |         | plain    |             |              | 
Partition key: RANGE (report_date)
Indexes:
    "table01_pkey" PRIMARY KEY, btree (id, report_date)
Number of partitions: 0

To proceed, we will use the partman_create_parent function to create child partitions for the parent table:

 
SELECT partman.create_parent( p_parent_table => 'public.table01', \
p_control => 'report_date', \
p_type => 'native', \
p_interval=> 'daily', \
p_premake => 2);

Here's the breakdown of the parameters used:

p_parent_table = parent table name
p_control = the column to control the partition creation, partitioning will be based on the this column
p_type = type of partition ( can be partman or native )
Here partman option allows us to create trigger-based partitions using pg_partman’s method of partitioning , although we get better read and write performance by using the old school native partitioning that comes with PostgreSQL.
p_interval = the time range for new partition creation (can be hourly , daily , monthly etc)
p_premake = number of partitions to be created, before and after the current time
(by default , pg_partman creates 4 partitions as premake ) , we’ve used 2

After executing the above command, check the structure of the table01 again to see the newly created child partitions:

 
demo_db=# \d+ table01
                                      Partitioned table "public.table01"

   Column    |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id          | integer |           | not null |         | plain    |             |              | 
 name        | text    |           | not null |         | extended |             |              | 
 report_date | date    |           | not null |         | plain    |             |              | 
Partition key: RANGE (report_date)
Indexes:
    "table01_pkey" PRIMARY KEY, btree (id, report_date)
Partitions: table01_p2023_08_17 FOR VALUES FROM ('2023-08-17') TO ('2023-08-18'),
            table01_p2023_08_18 FOR VALUES FROM ('2023-08-18') TO ('2023-08-19'),
            table01_p2023_08_19 FOR VALUES FROM ('2023-08-19') TO ('2023-08-20'),
            table01_p2023_08_20 FOR VALUES FROM ('2023-08-20') TO ('2023-08-21'),
            table01_p2023_08_21 FOR VALUES FROM ('2023-08-21') TO ('2023-08-22'),
            table01_default DEFAULT

As you can observe, the table01 s now a partitioned table with child partitions that span daily intervals. The partitioned table, table01 is now having child tables that are segregated based on the time range, The new child tables will be created based on the p_interval specified, which, in our case is daily. Therefore each child table will be having the data for a day.

Notice that there are 4 child tables in number , that is because pg_partman creates number of premake before and after the current child table, in our case 2 before and 2 after.

This is helpful in case we need to partition an existing table already having data ,creating premakes will add the older data to the child tables prior and the new data will be added to the child tables following.

Configuring Partition Table for Data Archival:

Another powerful feature of the pg_partman extension is the ability to automatically archive child tables (child partitions) from the parent table. This archival process can be configured to detach child tables automatically based on specific criteria, such as time, and store them in a separate schema for backup or archival purposes. This helps to maintain the parent table's optimal performance by keeping only actively used partitions while efficiently managing historical data.

Here's how you can configure table partition archival using pg_partman:

We would need to have a look at the configuration for the tables managed by pg_partman. We can check the partman config by looking at the partman.part_config table :

 
demo_db=# select * from partman.part_config ;

-[ RECORD 1 ]--------------+--------------------------------
parent_table               | public.table01
control                    | report_date
partition_type             | native
partition_interval         | 1 day
constraint_cols            | 
premake                    | 2
optimize_trigger           | 4
optimize_constraint        | 30
epoch                      | none
inherit_fk                 | t
retention                  | 
retention_schema           | 
retention_keep_table       | t
retention_keep_index       | t
infinite_time_partitions   | f
datetime_string            | YYYY_MM_DD
automatic_maintenance      | on
jobmon                     | t
sub_partition_set_full     | f
undo_in_progress           | f
trigger_exception_handling | f
upsert                     | 
trigger_return_null        | t
template_table             | partman.template_public_table01
publications               | 
inherit_privileges         | f
constraint_valid           | t
subscription_refresh       | 
drop_cascade_fk            | f
ignore_default_data        | f

As observed, the partman.part_config currently contains the information for table01 along with its associated settings. The intention is to retain data within each child table for a period of one day within the parent table. Simultaneously, the child tables will be systematically archived into a distinct schema referred to as backup.

 
demo_db=# create schema backup ;
CREATE SCHEMA
demo_db=# set search_path to backup ;
SET

Modify the configuration parameters in the partman.part_config able to enable auto-archival:

Set infinite_time_partitions to true: This creates infinite child tables for the parent table.
Set retention to 1 day: This specifies that each child table will be retained for one day before being detached.
Set retention_schema to backup: This designates the schema where detached child tables will be stored.
Set retention_keep_table to true: This ensures that the detached child tables are stored in the retention schema indefinitely.
Set retention_keep_index to true: This ensures that the indexes of detached child tables are also stored in the retention schema.

Here's an example of how to update the configuration for the table01

 
demo_db=# UPDATE partman.part_config 
SET infinite_time_partitions=true,
retention='1 day',
retention_schema='backup',
retention_keep_table=true, 
retention_keep_index=true 
WHERE parent_table='public.table01' ;

UPDATE 1

We can verify the settings :

 
demo_db=# select * from partman.part_config ;
-[ RECORD 1 ]--------------+--------------------------------
parent_table               | public.table01
control                    | report_date
partition_type             | native
partition_interval         | 1 day
constraint_cols            | 
premake                    | 2
optimize_trigger           | 4
optimize_constraint        | 30
epoch                      | none
inherit_fk                 | t
retention                  | 1 day
retention_schema           | backup
retention_keep_table       | t
retention_keep_index       | t
infinite_time_partitions   | t
datetime_string            | YYYY_MM_DD
automatic_maintenance      | on
jobmon                     | t
sub_partition_set_full     | f
undo_in_progress           | f
trigger_exception_handling | f
upsert                     | 
trigger_return_null        | t
template_table             | partman.template_public_table01
publications               | 
inherit_privileges         | f
constraint_valid           | t
subscription_refresh       | 
drop_cascade_fk            | f
ignore_default_data        | f

Call the maintenance function to apply the changes:

 
demo_db=# CALL partman.run_maintenance_proc();
CALL

After the configuration is updated and the maintenance function is called, the archival process will start working. Child tables older than the specified retention period will be detached from the parent table and stored in the backup schema.

Verify the archival process:

Now, we can see after 1 day, the child table gets detached from the parent table and gets stored in the backup schema:

 
demo_db=# \d+ table01
                                      Partitioned table "public.table01"
   Column    |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id          | integer |           | not null |         | plain    |             |              | 
 name        | text    |           | not null |         | extended |             |              | 
 report_date | date    |           | not null |         | plain    |             |              | 
Partition key: RANGE (report_date)
Indexes:
    "table01_pkey" PRIMARY KEY, btree (id, report_date)
Partitions: table01_p2023_08_18 FOR VALUES FROM ('2023-08-18') TO ('2023-08-19'),
            table01_p2023_08_19 FOR VALUES FROM ('2023-08-19') TO ('2023-08-20'),
            table01_p2023_08_20 FOR VALUES FROM ('2023-08-20') TO ('2023-08-21'),
            table01_p2023_08_21 FOR VALUES FROM ('2023-08-21') TO ('2023-08-22'),
            table01_default DEFAULT

Backup schema:

 
demo_db=# \dt
                      List of relations
 Schema |        Name         |       Type        |  Owner   
--------+---------------------+-------------------+----------
 backup | table01_p2023_08_17 | table             | postgres

Configuring partition table for data deletion:

Using the pg_partman extension we can also drop the child tables (child partitions) from the parent table. The child tables can get dropped automatically based on the configuration.

By this technique, the parent table can only have the required and actively used partitions, and the older child tables can be dropped.

For this, we would need to have a look at the configuration for the tables managed by pg_partman.

The configuration details are accessible within the partman.part_config table:

 
demo_db=# select * from partman.part_config ;

-[ RECORD 1 ]--------------+--------------------------------
parent_table               | public.table01
control                    | report_date
partition_type             | native
partition_interval         | 1 day
constraint_cols            | 
premake                    | 2
optimize_trigger           | 4
optimize_constraint        | 30
epoch                      | none
inherit_fk                 | t
retention                  | 1 day
retention_schema           | backup
retention_keep_table       | t
retention_keep_index       | t
infinite_time_partitions   | t
datetime_string            | YYYY_MM_DD
automatic_maintenance      | on
jobmon                     | t
sub_partition_set_full     | f
undo_in_progress           | f
trigger_exception_handling | f
upsert                     | 
trigger_return_null        | t
template_table             | partman.template_public_table01
publications               | 
inherit_privileges         | f
constraint_valid           | t
subscription_refresh       | 
drop_cascade_fk            | f
ignore_default_data        | f

We will be keeping the data retention for every child table to 1 day, on the parent table and will be dropping the child tables that are older

In order to set auto-purging for this table we would need to change the following parameters in the partman.part_config table:

Set infinite_time_partitions to true: This setting generates an unlimited series of child tables for the parent table.
Configure retention to 1 day: Each child table will be preserved for a single day before being detached from the parent table.
Adjust retention_keep_table to false: This setting ensures that child tables won't be stored indefinitely and will be dropped upon reaching the retention period.
Modify retention_keep_index to false: This adjustment guarantees that child table indexes will be similarly dropped once the retention period is exceeded.

Updating the configuration for table01

 
demo_db=# UPDATE partman.part_config 
SET infinite_time_partitions=true,
retention='1 day',
retention_keep_table=false, 
retention_keep_index=false 
WHERE parent_table='public.table01' ;

Confirming the updated settings:

 
demo_db=# select * from partman.part_config ;
-[ RECORD 1 ]--------------+--------------------------------
parent_table               | public.table01
control                    | report_date
partition_type             | native
partition_interval         | 1 day
constraint_cols            | 
premake                    | 2
optimize_trigger           | 4
optimize_constraint        | 30
epoch                      | none
inherit_fk                 | t
retention                  | 1 day
retention_schema           | backup
retention_keep_table       | f
retention_keep_index       | f
infinite_time_partitions   | t
datetime_string            | YYYY_MM_DD
automatic_maintenance      | on
jobmon                     | t
sub_partition_set_full     | f
undo_in_progress           | f
trigger_exception_handling | f
upsert                     | 
trigger_return_null        | t
template_table             | partman.template_public_table01
publications               | 
inherit_privileges         | f
constraint_valid           | t
subscription_refresh       | 
drop_cascade_fk            | f
ignore_default_data        | f

This setting will automatically drop the child tables once the data retention is crossed.

For the changes to take place we would need to call the maintenance function for the partman that applies the maintenance changes for the tables:

 
demo_db=# CALL partman.run_maintenance_proc();
CALL

After 1 day we can see the parent table structure:

 
demo_db=# \d+ table01
                                      Partitioned table "public.table01"
   Column    |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id          | integer |           | not null |         | plain    |             |              | 
 name        | text    |           | not null |         | extended |             |              | 
 report_date | date    |           | not null |         | plain    |             |              | 
Partition key: RANGE (report_date)
Indexes:
    "table01_pkey" PRIMARY KEY, btree (id, report_date)
Partitions: table01_p2023_08_19 FOR VALUES FROM ('2023-08-19') TO ('2023-08-20'),
            table01_p2023_08_20 FOR VALUES FROM ('2023-08-20') TO ('2023-08-21'),
            table01_p2023_08_21 FOR VALUES FROM ('2023-08-21') TO ('2023-08-22'),
            table01_default DEFAULT

Scheduling maintenance

The creation and purging of partition tables, in pg_partman is part of the maintenance job, which is taken care by the partman.run_maintenance_proc function.

For seamless execution, it's imperative to schedule the invocation of the partman.run_maintenance_proc function at regular intervals. This scheduling is typically aligned with the specific partition creation or deletion requirements, whether on a daily or hourly basis.

We can set a cron job for the same using crontab in linux or using pg_cron at the database level to run the query (CALL partman.run_maintenance_proc();) calling the partman.run_maintenance_proc function.

In wrapping up, it's clear that pg_partman stands as a game-changing extension for PostgreSQL's table partitioning. With its automation prowess in creating, managing, and upkeeping partitions, it's a time-saving marvel. The built-in functions that handle partition creation and removal based on time intervals not only boost efficiency but also mitigate the chances of human slip-ups. In essence, pg_partman streamlines table partitioning, transforming what could be complex into something remarkably user-friendly.

So, whether you're a seasoned pro or a casual user, this extension is your ticket to smoother, hassle-free data management.

Check out our recent blogs here

No items found.