In the realm of data management, PostgreSQL Partitioning emerges as a powerful solution. Imagine streamlining your data into organized segments, each known as a partition. Whether dealing with time-series data or custom intervals, PostgreSQL Partitioning optimizes organization.
The real magic? The partition key. It guides data to the right partition with finesse, like a GPS for your information. These partitions aren't isolated islands; they're connected, forming a coherent data landscape.
In the quest for data access, PostgreSQL plays a pivotal role. It retrieves only what's essential, enabling swift and precise queries. However, while PostgreSQL inherently offers partitioning features, the process of creation and management can prove intricate and time-intensive. Establishing new partitions and managing outdated ones often demands manual labor. But there's a solution to explore: pg_partman, an advanced PostgreSQL extension that elevates partition management to new heights.
Introducing pg_partman: Effortless PostgreSQL Partitioning
pg_partman is an extension designed to create and oversee sets of partitioned tables, whether based on time or serial numbers. The creation of child tables is fully managed by the extension itself. Moreover, there's an optional retention policy that can automatically discard partitions that are no longer necessary, To streamline partition maintenance, the extension includes a background worker (BGW) process, eliminating the need for an external scheduler like cron in many instances.
Installation of pg_partman
Note: Compatible with PostgreSQL version 10 or higher.
For this demonstration, we'll utilize PostgreSQL version 15.
Installation on EC2/VMs
To get started with pg_partman, follow these steps:
- Clone the git repository:
Navigate to the pg_partman directory and install the make file:
Note: If you encounter issues related to the missing postgresql-server-dev package, you can resolve it by running: sudo apt install postgresql-server-dev-XX, where XX is the PostgreSQL version
If you prefer plain PL/PGSQL functions without the background worker, use this command instead:
Add pg_partman_bgw to the shared_preload_libraries in the postgresql.conf file and restart the database instance:
This change necessitates a database restart.
- Verify the installation by checking the shared preload libraries:
For Cloud-hosted Machines
If you are using cloud-hosted machines, the required packages for installation will be available on the cloud server. Add the parameter pg_partman_bgw to the shared_preload_libraries and then restart the database instance.
Working with the demo_db Database:
In this demo, we'll work with a database named demo_db:
Connect to the demo_db database and create the pg_partman extension within a new schema named partman
To confirm the installation, list the installed extensions:
We have successfully completed the installation process.
Use Cases: Exploring Table Partitioning with pg_partman
Creating Table Partitions using pg_partman:
Let's delve into creating table partitions using pg_partman with a real-world example. Suppose we want to create a partitioned table based on the range partitioning technique to organize data by time intervals. Here's how to achieve this:
First, we create a new partitioned table named table01 using SQL. This table will be partitioned based on the report_date column.
Note: The column used as the partition key must be part of the primary key, and for range partitioning, it must be of timestamp type.
Checking the structure of the partitioned table01 , you can see that it has not yet been populated with child partitions:
To proceed, we will use the partman_create_parent function to create child partitions for the parent table:
Here's the breakdown of the parameters used:
- p_parent_table = parent table name
- p_control = the column to control the partition creation, partitioning will be based on the this column
- p_type = type of partition ( can be partman or native )
- Here partman option allows us to create trigger-based partitions using pg_partman’s method of partitioning , although we get better read and write performance by using the old school native partitioning that comes with PostgreSQL.
- p_interval = the time range for new partition creation (can be hourly , daily , monthly etc)
- p_premake = number of partitions to be created, before and after the current time
- (by default , pg_partman creates 4 partitions as premake ) , we’ve used 2
After executing the above command, check the structure of the table01 again to see the newly created child partitions:
As you can observe, the table01 s now a partitioned table with child partitions that span daily intervals. The partitioned table, table01 is now having child tables that are segregated based on the time range, The new child tables will be created based on the p_interval specified, which, in our case is daily. Therefore each child table will be having the data for a day.
Notice that there are 4 child tables in number , that is because pg_partman creates number of premake before and after the current child table, in our case 2 before and 2 after.
This is helpful in case we need to partition an existing table already having data ,creating premakes will add the older data to the child tables prior and the new data will be added to the child tables following.
Configuring Partition Table for Data Archival:
Another powerful feature of the pg_partman extension is the ability to automatically archive child tables (child partitions) from the parent table. This archival process can be configured to detach child tables automatically based on specific criteria, such as time, and store them in a separate schema for backup or archival purposes. This helps to maintain the parent table's optimal performance by keeping only actively used partitions while efficiently managing historical data.
Here's how you can configure table partition archival using pg_partman:
- We would need to have a look at the configuration for the tables managed by pg_partman. We can check the partman config by looking at the partman.part_config table :
As observed, the partman.part_config currently contains the information for table01 along with its associated settings. The intention is to retain data within each child table for a period of one day within the parent table. Simultaneously, the child tables will be systematically archived into a distinct schema referred to as backup.
Modify the configuration parameters in the partman.part_config able to enable auto-archival:
- Set infinite_time_partitions to true: This creates infinite child tables for the parent table.
- Set retention to 1 day: This specifies that each child table will be retained for one day before being detached.
- Set retention_schema to backup: This designates the schema where detached child tables will be stored.
- Set retention_keep_table to true: This ensures that the detached child tables are stored in the retention schema indefinitely.
- Set retention_keep_index to true: This ensures that the indexes of detached child tables are also stored in the retention schema.
Here's an example of how to update the configuration for the table01
We can verify the settings :
Call the maintenance function to apply the changes:
After the configuration is updated and the maintenance function is called, the archival process will start working. Child tables older than the specified retention period will be detached from the parent table and stored in the backup schema.
Verify the archival process:
Now, we can see after 1 day, the child table gets detached from the parent table and gets stored in the backup schema:
Backup schema:
Configuring partition table for data deletion:
Using the pg_partman extension we can also drop the child tables (child partitions) from the parent table. The child tables can get dropped automatically based on the configuration.
By this technique, the parent table can only have the required and actively used partitions, and the older child tables can be dropped.
For this, we would need to have a look at the configuration for the tables managed by pg_partman.
The configuration details are accessible within the partman.part_config table:
We will be keeping the data retention for every child table to 1 day, on the parent table and will be dropping the child tables that are older
In order to set auto-purging for this table we would need to change the following parameters in the partman.part_config table:
- Set infinite_time_partitions to true: This setting generates an unlimited series of child tables for the parent table.
- Configure retention to 1 day: Each child table will be preserved for a single day before being detached from the parent table.
- Adjust retention_keep_table to false: This setting ensures that child tables won't be stored indefinitely and will be dropped upon reaching the retention period.
- Modify retention_keep_index to false: This adjustment guarantees that child table indexes will be similarly dropped once the retention period is exceeded.
Updating the configuration for table01
Confirming the updated settings:
This setting will automatically drop the child tables once the data retention is crossed.
For the changes to take place we would need to call the maintenance function for the partman that applies the maintenance changes for the tables:
After 1 day we can see the parent table structure:
Scheduling maintenance
The creation and purging of partition tables, in pg_partman is part of the maintenance job, which is taken care by the partman.run_maintenance_proc function.
For seamless execution, it's imperative to schedule the invocation of the partman.run_maintenance_proc function at regular intervals. This scheduling is typically aligned with the specific partition creation or deletion requirements, whether on a daily or hourly basis.
We can set a cron job for the same using crontab in linux or using pg_cron at the database level to run the query (CALL partman.run_maintenance_proc();) calling the partman.run_maintenance_proc function.
In wrapping up, it's clear that pg_partman stands as a game-changing extension for PostgreSQL's table partitioning. With its automation prowess in creating, managing, and upkeeping partitions, it's a time-saving marvel. The built-in functions that handle partition creation and removal based on time intervals not only boost efficiency but also mitigate the chances of human slip-ups. In essence, pg_partman streamlines table partitioning, transforming what could be complex into something remarkably user-friendly.
So, whether you're a seasoned pro or a casual user, this extension is your ticket to smoother, hassle-free data management.
Check out our recent blogs here