High Availability/Scalability Using Haproxy And Pgbouncer In Postgresql

I have multiple PostgreSQL servers for a web application. Typically one master and multiple slaves in hot standby mode (asynchronous streaming replication).
I use PGBouncer for connection pooling: one instance installed on each PG server (port 6432) connecting to database on localhost. I use transaction pool mode.
In order to load-balance my read-only connections on slaves, I use HAProxy (v1.5) with a conf more or less like this:
      listen pgsql_pool 0.0.0.0:10001
      mode tcp
      option pgsql-check user ha
      balance roundrobin
      server master 10.0.0.1:6432 check backup
      server slave1 10.0.0.2:6432 check
      server slave2 10.0.0.3:6432 check
      server slave3 10.0.0.4:6432 check
So, my web application connects to haproxy (port 10001), that load-balance connections on multiple pgbouncer configured on each PG slave.
Here is a representation graph of my current architecture:
pgbouncer > postgresql” height=”567″ src=”https://lh4.googleusercontent.com/uFI6eBjkUnQiRyaen5e55VxV_KAwp_RG24X5ZBUGIERu8CkV6-e7bJInijZBRGqrE_RkKXvT6ekFUaysIcLX4_kuAlM6QZ2niPz_rfSICv9wHiJGcDnYO7k0RNq1iaX5nPw20kw” style=”-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);” width=”602″ />
This works quite well like this, but I realize that some implements this quite differently: web application connects to a single PGBouncer instance that connects to HAproxy which load-balance over multiple PG servers:
Here is the reason: HAProxy redirects connection to different servers. this results in MAC address change in the database connection. So if PGBouncer is above HAProxy, each time the connections in the pool gets invalidated because of MAC address change.
but
haproxy > postgresql” height=”567″ src=”https://lh6.googleusercontent.com/UXp7Ti_yzlKwrAsW9yqaFfVlZ3J-Cq5e4DD9EnOESf2vVxDPBKw1NW-MaFwkM34s832FQbBGhQ4jKDJNCmqO443ya-QP8Pv3GqLMhklAcivElzUS08tKmCpB4vW8s1uSj5pOsyQ” style=”-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);” width=”602″ />
Clients making a large number of DB requests will have to setup a connection with a remote PGBouncer for each request. This is more expensive, than running PgBouncer locally (so the application connects to pgbouncer locally) and pgBouncer maintains a pool of connections with the remote PG server.
So, IMO, PGBouncer -> HAProxy -> PGServer seems to be better than, HAProxy -> PGBouncer -> PGServer, especially when the PGBouncer is local to the client application.

9.4 Replication with RepMgr and PgBouncer in Postgresql

repmgr 3.0:-
                   repmgr is an open-source tool suite to  monitor replication, and perform administrative tasks such as failover or manual switchover operations.
PgBouncer:-
                  PgBouncer is connection pooler for PostgreSQL databases and acts as gateway for clients to connect to the databases.pgBouncer allows for backend database configuration changes by just editing its configuration file and reloading the process.
Related image
Setup:-
dbhost1: first database host running on 192.168.0.100
dbhost2: second database host running on 192.168.0.101
pgbouncer-host: host running pgbouncer on 192.168.0.102
Instructions:-
1.Install dependencies.
2.Configure ssh access between db nodes.
3.Configure databases.
4.Configure PostgreSQL replication.
5.Configure replication manager.
6.Clone slave.
7.Configure pgBouncer.
8.Failover.
1) Install Dependences (dbhost1, dbhost2, pgbouncer-host)
The default ubuntu 14.04 installation does not include PostgreSQL 9.4 so we will need to add the repository:
Create the file /etc/apt/sources.list.d/pgdg.list, and add a line for the repository
Import the repository signing key, and update the package lists
wget –quiet -O – https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add –  
sudo apt-get update  
Install the dependencies for database hosts (dbhost1,dbhost2)
sudo apt-get install postgresql-9.4  
sudo apt-get install postgresql-9.4-repmgr  
sudo apt-get install postgresql-client-9.4  
Install the dependencies for pgBouncer host (pgbouncer-host)
sudo apt-get install postgresql-client-9.4  
sudo apt-get install pgbouncer  
2) Configure password-less ssh between dbhosts (dbhost1,dbhost2)
by default postgreSQL creates a postgres local linux user which is used to do all database management operations. To create slaves or recreate a master after failure repmgr requires that the user running the commands have password-less ssh access to the source host. Since the postgres user is the user which has permissions on the postrgres directories, we will configure that user for ssh.
switch to the user
$ sudo su postgres
create ssh key pair
$ ssh-keygen
copy the publich key id_rsa.pub contents and add it to the authorized_keys file in the other host’s postgres user ssh directory. For more details see https://github.com/2ndQuadrant/repmgr/blob/master/SSH-RSYNC.md
test that you can ssh between servers before going to the next step.
3) Configure Databases and users (dbhost1)
We will need two databases one will be used by repmgr and the other will be our application database. All commands are to be executed using the postgres user.
for repmgr database
$ createuser -s repmgr
$ createdb repmgr -O repmgr
for our application database
$ createuser test_user
$ createdb test_db -O test_user
since this user will be used by the applications (clients) we will set a password for it. Run the following command from inside a psql shell with the default postgres user.
postgres=# ALTER USER test_user WITH PASSWORD ”;
4) Configure PostgreSQL replication (dbhost1)
We will now configure the master db host (dbhost1) for replication. This is acheived by editing the postgresql.conf file and updating the follwing items
listen_addresses=’*’  
wal_level = ‘hot_standby’  
archive_mode = on  
archive_command = ‘cd .’  
max_wal_senders = 10  
max_replication_slots = 1  
hot_standby = on  
shared_preload_libraries = ‘repmgr_funcs’  
We also need to configure the authentication configuration (pg_hba.conf) to allow repmgr to access replication database and also allow password based access for our application database (test_db). We will add the following lines:
host    repmgr          repmgr     192.168.0.0/0     trust  
host    replication     repmgr     192.168.0.0/0     trust  
host    test_db         test_user  192.168.0.0/0     md5  
restart the postgres service after the changes are done
$ sudo service postgresql restart
5) Configure Replication Manager (dbhost1,dbhost2)
We will need to create a repmgr.conf on each database host. The file could be stored anywhere but we will use /etc/repmgr/repmgr.conf.
The file contents for the dbhost1 should be
cluster=test  
node=1  
node_name=node1  
use_replication_slots=1  
conninfo=’host=dbhost1 user=repmgr dbname=repmgr’  
pg_bindir=/usr/lib/postgresql/9.4/bin  
We are now ready to join our master to the replication topology. Using the postgres user run the command
$ repmgr -f /etc/repmgr/repmgr.conf master register
and similarly for dbhost2 we need to create the file /etc/repmgr/repmgr.conf with the contents
cluster=test  
node=2  
node_name=node2  
use_replication_slots=1  
conninfo=’host=dbhost2 user=repmgr dbname=repmgr’  
pg_bindir=/usr/lib/postgresql/9.4/bin  
6) Clone Slave (dbhost2)
ssh into our slave server dbhost
stop the postgresql service
$ sudo service postgresql stop
run the following command using the postgres user to clone the master
$ repmgr -f /etc/repmgr/repmgr.conf –force –rsync-only -h dbhost1 -d repmgr -U repmgr –verbose
standby clone
start the postgresql server
$ sudo service postgresql start
register the slave as a standby
$ repmgr -f /etc/repmgr/repmgr.conf –force standby register
if all the previous steps ran successfully we should be able to see the state of our replication cluster using the command
$ repmgr -f /etc/repmgr/repmgr.conf cluster show
and it should return an output like this
 Role      | Connection String  
* master  | host=dbhost1 user=repmgr dbname=repmgr
 standby | host=dbhost2 user=repmgr dbname=repmgr
7) Configure pgBouncer (pgbouncer-host)
All pgBouncer needs is a configuration file pgbouncer.ini which can be stored anywhere in the system. The contents of the file should be
[databases]
test_db = host=dbhost1 port=5432 dbname=test_db
[pgbouncer]
listen_port = 6543  
listen_addr = 127.0.0.1  
auth_type = md5  
auth_file = users.txt  
logfile = pgbouncer.log  
pidfile = pgbouncer.pid  
admin_users = test_user  
The configuration above instructs pgBouncer to listen for connections on port 6543 and relay connections to the database test_db to our current master dbhost1 on port 5432.
Because pgBouncer uses md5 based authentication with postgreSQL 8.0 style files we need to create a users.txt file which will have the user credentials for the database users. A sample of this file is below
“test_user” “somepassword”
To run pgBouncer we run the command
$ pgbouncer -d pgbouncer.ini -R
Congratulations!! you now have a running master-slave replicated postgreSQL setup.
8) Failover (dbhost2,pgbouncer-host)
In case of failover due to the master node failure. All that is needed to do is first make sure the master is truely down. Preferably, stopping the postgresql service if it is running. Then we just need to promote our slave to become the new master using the command:
$ repmgr -f /etc/repmgr/repmgr.conf standby promote
and then change the pgBouncer configuration to point to dbhost2 and re run the command
$ pgbouncer -d pgbouncer.ini -R
furher know about pgbouncer

Check if your current database is connected to PgBouncer or not.

                        Determining if your current database connection is using PgBouncer, or going directly to Postgres itself, can be challenging, as PgBouncer is a very low-level, transparent interface. It is possible, and here are some detection methods you can use.

This was inspired by someone asking on the Perl DBD IRC channel if it was possible to easily tell if your current database handle (usually “$dbh”) is connected to PgBouncer or not. Since I’ve seen this question asked in other venues, I decided to take a crack at it.

There are actually two questions to be answered: (1) are we connected to PgBouncer, and if so, (2) what pool_mode is being run? The quickest and easiest way I found to answer the first question is to try and connect to a non-existent database. Normally, this is a FATAL message, as seen here:

$ psql testdb -p 5432
testdb=# \c ghostdb
FATAL:  database “ghostdb” does not exist
Previous connection kept

testdb=# 
However, a slightly different ERROR message is returned if the same thing is attempted while connected to PgBouncer:
$ psql testdb -p 6432
testdb=# \c ghostdb
ERROR:  No such database: ghostdb
Previous connection kept

Thus, an ERROR will always indicate that you are connected to PgBouncer and not directly to Postgres, which will always issue a FATAL.
In the future, there will be an even simpler method. As of this writing, pgBouncer 1.6 has not been released, but it will have the ability to customize the application_name. This is a configurable session-level variable that is fairly new in Postgres. Andrew Dunstan wrote a patch which enables adding this to your pgbouncer.ini file:
application_name_add_host = 1
This will make PgBouncer modify the application_name to append some information to it such as the remote host, the remote port, and the local port. This is a feature many PgBouncer users will appreciate, as it offers an escape from the black hole of connection information that PgBouncer suffers from. Here is what it looks like on both a normal Postgres connection, and a PgBouncer connection. As you can see, this is an easier check than the “invalid database connection” check above:

 Postgres:
$ psql testdb -p 5432 -c ‘show application_name’
 application_name 
——————
 psql
 PgBouncer:
$ psql testdb -p 6432 -c ‘show application_name’
        application_name        
——————————–
 psql – unix(7882@gtsm.com):6432

## DBD::Pg connections to PgBouncer get a very similar change:
$ perl testme.tmp.pl –port 6432
app – unix(6772@gtsm.com):6432

Now we have answered question of “are we connected to PgBouncer or not?”. The next question is which pool mode we are in. There are three pool modes you can set for PgBouncer, which controls when your particular connection is returned to “the pool”. For “session” mode, you keep the same Postgres backend the entire time you are connected. For “transaction”, you keep the same Postgres backend until the end of a transaction. For “statement”, you may get a new Postgres backend after each statement.
First, we can check if we are connected to PgBouncer in a statement level pool mode by taking advantage of the fact that multi-statement transactions are prohibited. PgBouncer enforces this by intercepting any attempts to enter a transaction (e.g. by issuing a BEGIN command). A very PgBouncer specific error about “Long transactions not allowed” is issued back to the client like so:

$ psql testdb -p 6432
testdb=# begin;
ERROR:  Long transactions not allowed

So, that takes care of detecting a pool_mode set to ‘statement’. The other two modes, transaction and session, will *not* give the same error. Thus, seeing that error indicates you are using a statement-level PgBouncer connection.

The next pool mode is “transaction”, which means that the server connection if released back to the pool at the end of a transaction. To figure out if we are in this mode, we take advantage of the fact that PgBouncer can be set to clean up the connection at the end of each transaction by issuing a specific command. By default, the command set by server_reset_query is DISCARD ALL, which invalidates any prepared statements, temporary tables, and other transaction-spanning, session-level items. Thus, our test will see if these session-level artifacts get discarded or not:

 Direct Postgres:
$ psql testdb -p 5432
testdb=# prepare abc(int) as select $1::text;
PREPARE
testdb=# execute abc(1);
 text
——
 1

 PgBouncer:
$ psql testdb -p 6432
testdb=# prepare abc(int) as select $1::text;
PREPARE
testdb=# execute abc(1);
ERROR:  prepared statement “abc” does not exist

Keep in mind that there are no true “transactionless” commands in Postgres. Even though we did not use a BEGIN in the psql prompt above, each command is treated as its own mini-transaction. In the case of the PgBouncer connection, the prepare is immediately followed with a DISCARD ALL, which means that our prepared statement no longer exists. Hence, we have determined that we are using a transaction-level PgBouncer connection.
Unfortunately, not getting an error does not necessarily mean your PgBouncer is NOT in transaction mode! It could be that server_reset_query is empty, meaning that temporary artifacts are not discarded at the end of the transaction. In such a case, we can take advantage of the fact that PgBouncer will allow other clients to share in our current connection, and thus be able to see the temporary items. If we create a temporary table in one pgbouncer connection, then connect again as a new client, the temporary table will only show up if we are sharing sessions but not transactions. Easier shown than explained, I suspect:

Regular Postgres gets a fresh session:
$ psql test1 -p 5432
test1=# create temp table abc(a int);
CREATE TABLE
test1=# select * from abc;
(No rows)
test1=# ^Z ## (we suspend with CTRL-Z)
[2]+  Stopped                 psql test1 -p 5432

$ psql test1 -p 5432
test1=# select * from abc;
ERROR:  relation “abc” does not exist

 PgBouncer will re-use the same session:
$ psql test1 -p 6432
test1=# create temp table abc(a int);
CREATE TABLE
test1=# select * from abc;
(No rows)
test1=# ^Z
[2]+  Stopped                 psql test1 -p 6432

$ psql test1 -p 6432
test1=# select * from abc;
(No rows)

The final PgBouncer pool mode is “session”, and basically means the only advantage over a normal Postgres connection is the overhead to start up and connect to a new Postgres backend. Thus, the PgBouncer connections are only returned to the pool upon disconnection. The only way to tell if you are in this mode is by determining that you are *not* in the other two modes. 🙂
So, although PgBouncer is extremely transparent, there are some tricks to determine if you are connected to it, and at what pool_mode. If you can think of other (SQL-level!) ways to check, please let me know in the comments section.