PGPool-II is an open-source proxy software that can be used for the PostgreSQL servers(ProxySQL for MySQL) environment. It lies between the application server/PostgreSQL client and the PostgreSQL database servers and can provide much-needed features like Load Balancing, Query Cache, Connection Pooling, and Watchdog. Each feature requires a detailed explanation with supporting examples. So, in this part let’s discuss the basic setup of PGPool-II with PostgreSQL streaming replication and in the upcoming parts, we will explore more about its features.
History of PGPool-II:
PGPool-II started its life as PGPool in 2003 just as a connection pooling software. Later in 2006, it was released as PGPool-II with many features and with the elimination of many limitations in PGPool. Still, PGPool-II is one of the most popular proxy available for the PostgreSQL servers.
The necessity of PGPool-II:
Apart from the advanced features, PGPool needs for few cases like automatic failover of connections to the backend healthy DB server and maximum connection errors(Too many connections) in the applications. Also, connection pooling reduces the overhead of creating new connections on the database.
Now, let’s start with the setup of PGPool-II for the two PostgreSQL servers(Primary and Secondary) configured with streaming replication. Considering as having the primary and the secondary already which setup has not been explained here.
Below is the architecture of this setup which is planned to achieve.
Installation of PGPool-II :
The following steps have been performed on server A.
- Installing the required repo for CentOS 7
2. Installing the PGPool-II
The PostgreSQL version used in this setup is 12, so the pg12* of PGPool-II is used for libraries and extension directories of PostgreSQL 12. Replace step 2 pg12* with the appropriate version of PostgreSQL.
In case of any difficulties, while installing using the package manager or unable to install the version required, it is good to go with source installation.
- Downloading the source code of the version required
2. Extract the downloaded file
It will create the directory pgpool-II-4.2.1
3. Install the below packages that need to compile the code
4. Post successful installation, configure, compile and install it using the below commands
Configuration of PGPool-II:
The configuration file of PGPool-II is /etc/pgpool-II/pgpool.conf
First, creating two users on the primary server
- Monitor user – To perform a health check for the backend databases (only login is required)
- App user – For the application use case (Provide the required privileges based on the application purpose, read or write)
And creating a database for the monitor user
Make sure these users are replicated to the secondary server and also add an entry on the primary and secondary servers pg_hba.conf file to access the database from pgpool server and reload it.
And reload it to take effect.
Preparing the configuration file for the pgpool-II on server A
Here explaining the purpose of each variable on the config, it can be modified based on the requirements.
Listen_addresses – From which IP addresses pgpool accepts incoming requests. * means it accepts all requests.
Port – On which port, pgpool wants to listen for client requests. The port here using needs to be used on the application or the client to connect to the database.
Socket_dir – Directory for socket accepting connections
backend_hostname*, backend_port*, backend_weight* – hostname specifies the PostgreSQL backend server, port specifies the port number of the backend server, weight defines the load balance ratio of the backend server
Here we are having two servers, so the backend variable has id 0 and 1.
backend_hostname0, backend_hostname1.
In case, if we are adding one more server as the secondary in the future, it needs to be added as backend_hostname2, backend_port2, backend_weight2.
load_balance_mode
To enable load balancing on incoming select queries to both the backend servers.
master_slave_mode, master_slave_sub_mode
To determine the replication system used for data replication. The value used is stream as it is a streaming replication.
Streaming checks variables are used to monitor the status of streaming replication using the created monitor user. The variable delay_threshold defines the maximum lag (in bytes) in the secondary server. If the lag exceeds, pgpool won’t serve any select queries to the secondary to avoid reading stale data on the application.
allow_clear_text_frontend_auth – It allows the pgpool to get the password in plain text from the frontend and use it for backend authentication.
Once all the above settings are done, restarting the pgpool takes all the settings into effect.
Checking the connection using the pgpool from pgpool server
Now doing some tests and checking the status of the node and the queries distribution on the pgpool.
From the above output, the node status is online and queries are distributed.
That’s it on the basic pgpool setup with streaming replication. It is easy to add more replicas under it by adding backend_* variables.
Current architecture will be a single point of failure(SPOF) in case server1 (pgpool) goes down. So, in the next blog, we will look on how to achieve high availability on pgpool servers (PGPool Cluster) using watchdog and Virtual IP.