Monitoring Streaming Replication with script also in postgreSql

November 26, 2016

One of the easiest ways to monitor slave lag when using streaming replication is to turn hot standby on your slave and use pg_last_xact_replay_timestamp() and/or the other recovery information functions. Here’s an example query to run on the slave systems to get the number of seconds behind it is:

SELECT extract(epoch from now() – pg_last_xact_replay_timestamp()) AS slave_lag

The issue with this query is that while your slave(s) may be 100% caught up, the time interval being returned is always increasing until new write activity occurs on the master that the slave can replay. This can cause your monitoring to give false positives that your slave is falling behind if you have things set up to ensure your slaves are no more than a few minutes behind. A side affect of this monitoring query can also give you an indication that writes to your master have stopped for some reason.

One of our clients has a smaller sized database that doesn’t get quite as much write traffic as our typical clients do. But it still has failover slaves and still needs to be monitored just like our other larger clients to ensure it doesn’t fall too far behind. So, my coworker introduced me to the pg_stat_replication view that was added in PostgreSQL 9.1. Querying this from the master returns information about streaming replication slaves connected to it.

postgres=# select * from pg_stat_replication;

-[ RECORD 1 ]—-+——————————

pid | 16649

usesysid | 16388

usename | replication

application_name | walreceiver

client_addr | xxx.xxx.xxx.xxx

client_hostname | db1-prod-ca

client_port | 58085

backend_start | 2013-10-29 19:57:51.48142+00

state | streaming

sent_location | 147/11000000

write_location | 147/11000000

flush_location | 147/11000000

replay_location | 147/11000000

sync_priority | 0

sync_state | async

-[ RECORD 2 ]—-+——————————

pid | 7999

usesysid | 16388

usename | replication

application_name | walreceiver

client_addr | yyy.yyy.yyy.yyy

client_hostname | db2-prod

client_port | 54932

backend_start | 2013-10-29 15:32:47.256794+00

state | streaming

sent_location | 147/11000000

write_location | 147/11000000

flush_location | 147/11000000

replay_location | 147/11000000

sync_priority | 0

sync_state | async

He also provided a handy query to get back a simple, easy to understand numeric value to indicate slave lag. The issue I ran into using the query is that this view uses pg_stat_activity as one of its sources. If you’re not a superuser, you’re not going to get any statistics on sessions that aren’t your own (and hopefully you’re not using a superuser role as the role for your monitoring solution). So, instead I made a function with SECURITY DEFINER set, made a superuser role the owner, and gave my monitoring role EXECUTE privileges on the function.

CREATE OR REPLACE FUNCTION streaming_slave_check() RETURNS TABLE (client_hostname text, client_addr inet, byte_lag float)

LANGUAGE SQL SECURITY DEFINER

AS $$

SELECT

client_hostname,

client_addr,

sent_offset – (replay_offset – (sent_xlog – replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag

FROM (

SELECT

client_hostname,

client_addr,

(‘x’ || lpad(split_part(sent_location::text, ‘/’, 1), 8, ‘0’))::bit(32)::bigint AS sent_xlog,

(‘x’ || lpad(split_part(replay_location::text, ‘/’, 1), 8, ‘0’))::bit(32)::bigint AS replay_xlog,

(‘x’ || lpad(split_part(sent_location::text, ‘/’, 2), 8, ‘0’))::bit(32)::bigint AS sent_offset,

(‘x’ || lpad(split_part(replay_location::text, ‘/’, 2), 8, ‘0’))::bit(32)::bigint AS replay_offset

FROM pg_stat_replication

) AS s;

$$;

Running this query gives back a few handy columns that should be good enough for most monitoring tools. You can easily add more columns from pg_stat_replication or any other tables you need to join against for more info.

postgres=# select * from streaming_slave_check();

client_hostname | client_addr | byte_lag

—————–+—————–+———-

db1-prod-ca | xxx.xxx.xxx.xxx | 160

db2-prod | yyy.yyy.yyy.yyy | 160

UPDATE: If you’re running PostgreSQL 9.2+, there is a new, built-in function that avoids needing the above function all together and can just query pg_stat_replication directly.

postgres=# SELECT client_hostname

, client_addr

, pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location) AS byte_lag

FROM pg_stat_replication;

client_hostname | client_addr | byte_lag

—————–+—————+———-

db1-prod-ca | xxx.xxx.xx.xx | 0

db2-prod | yy.yyy.yyy.yy | 0

Unfortunately, this function still requires superuser privileges to obtain all relevant data and most monitoring tools do not use a superuser role (I hope). So, in that case you do still need a SECURITY DEFINER function, but it can be a much much simpler one

CREATE OR REPLACE FUNCTION streaming_slave_check() RETURNS TABLE (client_hostname text, client_addr inet, byte_lag numeric)

LANGUAGE SQL SECURITY DEFINER

AS $$

SELECT client_hostname

, client_addr

, pg_xlog_location_diff(pg_stat_replication.sent_location, pg_stat_replication.replay_location) AS byte_lag

FROM pg_stat_replication;

$$;

This can also be useful to monitor slave lag when you don’t have hot standby turned on for your slaves to allow read-only queries.

Combining both of the replication monitoring solutions mentioned in this post should give you a much better overall picture of the status of your master/slave systems.

pg_terminate_backend() to kill a connection in postgreSql

November 25, 2016

By postgresdba

You can use pg_terminate_backend() to kill a connection. You have to be superuser to use this function. This works on all operating systems the same.

SELECT
pg_terminate_backend(pid)
FROM
pg_stat_activity
WHERE
— don’t kill my own connection!
pid pg_backend_pid()
— don’t kill the connections to other databases
AND datname = ‘database_name’
;

Before executing this query, you have to REVOKE the CONNECT privileges to avoid new connections:
REVOKE CONNECT ON DATABASE dbname FROM PUBLIC, username;

If you’re using Postgres 8.4-9.1 use procpid instead of pid
SELECT
pg_terminate_backend(procpid)
FROM
pg_stat_activity
WHERE
— don’t kill my own connection!
procpid pg_backend_pid()
— don’t kill the connections to other databases
AND datname = ‘database_name’
;

Cancel query as non-superuser in postgreSQL

November 25, 2016

By postgresdba

A non-superuser on a PostgreSQL 9.1 database, and they have just created a bad trigger function that won’t stop. They can see the process, e.g.:

But they can’t stop it:
thedb=> SELECT pg_terminate_backend(45678);
ERROR: must be superuser to signal other server processes

solution
You must either be the superuser or logged in as the same user who owns the session you wish to cancel.

So connect as user myuser and you’ll be able to pg_cancel_backend, or, if that doesn’t respond, pg_terminate_backend.

Terminate PostgreSQL sessions

November 25, 2016

By postgresdba

This blog post is based on a PostgreSQL 9.1 version.

problems:-
Sometimes you need to terminate connections initiated by badly behaving client application, or just make sure nobody is querying database during a major update.
The solution is to use pg_stat_activity view to identify and filter active database sessions and then use pg_terminate_backend function to terminate them.
To prevent access during an update process or any other important activity you can simply revoke connect permission for selected database users or alter pg_database system table.

Who is permitted terminate connections:-
Every database role with superuser rights is permitted to terminate database connections.

How to display database sessions
pg_stat_activity system view provides detailed information about server processes.
SELECT datname as database,
procpid as pid,
usename as username,
application_name as application,
client_addr as client_address,
current_query
FROM pg_stat_activity

current PostgreSQL queries View

November 25, 2016

By postgresdba

An important tool for debugging PostgreSQL performance issues is the ability to view all the currently executing queries. With newer versions of PostgreSQL we get some more details. Here’s an example I’ve used on PostgreSQL 8.2:
SELECT datname,usename,procpid,client_addr,waiting,query_start,current_query FROM pg_stat_activity ;
Configuration

Storing of query strings is usually disabled in PostgreSQL by default. To enable it, use this line in your postgresql.conf
stats_command_string = true

This setting can be changed on a running database without restarting or effecting open connections by telling the PostgreSQL parent process, postmaster, to reload its config. Send it a SIGHUP or use the safer pg_ctl command with the reload option. Example:
pg_ctl reload

Queries
When stats_command_string is enabled the pg_stat_activity table holds all currently active query strings. The simplest query will show all current query strings along with which database they refer to and the process ID (PID) of the process serving that query.

SELECT datname,procpid,current_query FROM pg_stat_activity

Each row of pg_stat_activity represents one PostgreSQL process (PostgreSQL uses one server process per connection).Any processes that are not currently performing any queries will show as the current_query.
Note that queries from all databases within the server will be shown. If the user querying pg_stat_activity does not have privileges to access a database then it will not show the current_query.
The query_start column can also be used to show when the query started executing.
Another of my favourite queries is to show a top-like view of current queries, grouped by how many of the same query are running at that instant and the usernames belonging to each connection.

SELECT count(*) as cnt, usename, current_query FROM pg_stat_activity GROUP BY usename,current_query ORDER BY cnt DESC;

Example:
database1=# SELECT count(*) as cnt, usename, current_query FROM pg_stat_activity GROUP BY usename,current_query ORDER BY cnt DESC;
cnt | usename | current_query
—–+—————+—————
7 | freddykrueger |
3 | freddykrueger | SELECT name FROM users WHERE id=50;
1 | postgres |
(3 rows)

Show blocking Postgres processes and kill them in PostgreSql

November 25, 2016

By postgresdba

Have you ever had to kill your Postgres cluster because one hanging client starved other processes until most clients became unresponsive blocking on this one pesky process?
There is a very nice way to show currently blocked queries and the processes those are blocking on slightly adapted from this query posted on the Postgres mailing list. I suggest putting it into a view so you can easily access it when you need it:

CREATE VIEW blocking_procs AS
SELECT
kl.pid as blocking_pid,
ka.usename as blocking_user,
ka.current_query as blocking_query,
bl.pid as blocked_pid,
a.usename as blocked_user,
a.current_query as blocked_query,
to_char(age(now(), a.query_start),’HH24h:MIm:SSs’) as age
FROM pg_catalog.pg_locks bl
JOIN pg_catalog.pg_stat_activity a
ON bl.pid = a.procpid
JOIN pg_catalog.pg_locks kl
ON bl.locktype = kl.locktype
and bl.database is not distinct from kl.database
and bl.relation is not distinct from kl.relation
and bl.page is not distinct from kl.page
and bl.tuple is not distinct from kl.tuple
and bl.virtualxid is not distinct from kl.virtualxid
and bl.transactionid is not distinct from kl.transactionid
and bl.classid is not distinct from kl.classid
and bl.objid is not distinct from kl.objid
and bl.objsubid is not distinct from kl.objsubid
and bl.pid kl.pid
JOIN pg_catalog.pg_stat_activity ka
ON kl.pid = ka.procpid
WHERE kl.granted and not bl.granted
ORDER BY a.query_start;

How to test the query on a testing server (not your production DB server)
Connect to your database open a transaction and manually lock a table:

BEGIN;
LOCK your_table;
Leave the transaction and connection open.
Open another client that accesses that data:
# SELECT count(*) from your_table;

It’s now possible to kill the offending process holding the lock using:
# SELECT pg_terminate_backend(25842);

This will kill the connection where you’ve set the lock and the open transaction is rolled back but it seems to leave everything else intact. The second client should now get the response from the server.

Check if your current database is connected to PgBouncer or not.

November 25, 2016

By postgresdba

Determining if your current database connection is using PgBouncer, or going directly to Postgres itself, can be challenging, as PgBouncer is a very low-level, transparent interface. It is possible, and here are some detection methods you can use.

This was inspired by someone asking on the Perl DBD IRC channel if it was possible to easily tell if your current database handle (usually “$dbh”) is connected to PgBouncer or not. Since I’ve seen this question asked in other venues, I decided to take a crack at it.

There are actually two questions to be answered: (1) are we connected to PgBouncer, and if so, (2) what pool_mode is being run? The quickest and easiest way I found to answer the first question is to try and connect to a non-existent database. Normally, this is a FATAL message, as seen here:

$ psql testdb -p 5432
testdb=# \c ghostdb
FATAL: database “ghostdb” does not exist
Previous connection kept

testdb=#
However, a slightly different ERROR message is returned if the same thing is attempted while connected to PgBouncer:
$ psql testdb -p 6432
testdb=# \c ghostdb
ERROR: No such database: ghostdb
Previous connection kept

Thus, an ERROR will always indicate that you are connected to PgBouncer and not directly to Postgres, which will always issue a FATAL.
In the future, there will be an even simpler method. As of this writing, pgBouncer 1.6 has not been released, but it will have the ability to customize the application_name. This is a configurable session-level variable that is fairly new in Postgres. Andrew Dunstan wrote a patch which enables adding this to your pgbouncer.ini file:
application_name_add_host = 1
This will make PgBouncer modify the application_name to append some information to it such as the remote host, the remote port, and the local port. This is a feature many PgBouncer users will appreciate, as it offers an escape from the black hole of connection information that PgBouncer suffers from. Here is what it looks like on both a normal Postgres connection, and a PgBouncer connection. As you can see, this is an easier check than the “invalid database connection” check above:

Postgres:
$ psql testdb -p 5432 -c ‘show application_name’
application_name
——————
psql
PgBouncer:
$ psql testdb -p 6432 -c ‘show application_name’
application_name
——————————–
psql – unix(7882@gtsm.com):6432

## DBD::Pg connections to PgBouncer get a very similar change:
$ perl testme.tmp.pl –port 6432
app – unix(6772@gtsm.com):6432

Now we have answered question of “are we connected to PgBouncer or not?”. The next question is which pool mode we are in. There are three pool modes you can set for PgBouncer, which controls when your particular connection is returned to “the pool”. For “session” mode, you keep the same Postgres backend the entire time you are connected. For “transaction”, you keep the same Postgres backend until the end of a transaction. For “statement”, you may get a new Postgres backend after each statement.
First, we can check if we are connected to PgBouncer in a statement level pool mode by taking advantage of the fact that multi-statement transactions are prohibited. PgBouncer enforces this by intercepting any attempts to enter a transaction (e.g. by issuing a BEGIN command). A very PgBouncer specific error about “Long transactions not allowed” is issued back to the client like so:

$ psql testdb -p 6432
testdb=# begin;
ERROR: Long transactions not allowed

So, that takes care of detecting a pool_mode set to ‘statement’. The other two modes, transaction and session, will *not* give the same error. Thus, seeing that error indicates you are using a statement-level PgBouncer connection.

The next pool mode is “transaction”, which means that the server connection if released back to the pool at the end of a transaction. To figure out if we are in this mode, we take advantage of the fact that PgBouncer can be set to clean up the connection at the end of each transaction by issuing a specific command. By default, the command set by server_reset_query is DISCARD ALL, which invalidates any prepared statements, temporary tables, and other transaction-spanning, session-level items. Thus, our test will see if these session-level artifacts get discarded or not:

Direct Postgres:
$ psql testdb -p 5432
testdb=# prepare abc(int) as select $1::text;
PREPARE
testdb=# execute abc(1);
text
——
1

PgBouncer:
$ psql testdb -p 6432
testdb=# prepare abc(int) as select $1::text;
PREPARE
testdb=# execute abc(1);
ERROR: prepared statement “abc” does not exist

Keep in mind that there are no true “transactionless” commands in Postgres. Even though we did not use a BEGIN in the psql prompt above, each command is treated as its own mini-transaction. In the case of the PgBouncer connection, the prepare is immediately followed with a DISCARD ALL, which means that our prepared statement no longer exists. Hence, we have determined that we are using a transaction-level PgBouncer connection.
Unfortunately, not getting an error does not necessarily mean your PgBouncer is NOT in transaction mode! It could be that server_reset_query is empty, meaning that temporary artifacts are not discarded at the end of the transaction. In such a case, we can take advantage of the fact that PgBouncer will allow other clients to share in our current connection, and thus be able to see the temporary items. If we create a temporary table in one pgbouncer connection, then connect again as a new client, the temporary table will only show up if we are sharing sessions but not transactions. Easier shown than explained, I suspect:

Regular Postgres gets a fresh session:
$ psql test1 -p 5432
test1=# create temp table abc(a int);
CREATE TABLE
test1=# select * from abc;
(No rows)
test1=# ^Z ## (we suspend with CTRL-Z)
[2]+ Stopped psql test1 -p 5432

$ psql test1 -p 5432
test1=# select * from abc;
ERROR: relation “abc” does not exist

PgBouncer will re-use the same session:
$ psql test1 -p 6432
test1=# create temp table abc(a int);
CREATE TABLE
test1=# select * from abc;
(No rows)
test1=# ^Z
[2]+ Stopped psql test1 -p 6432

$ psql test1 -p 6432
test1=# select * from abc;
(No rows)

The final PgBouncer pool mode is “session”, and basically means the only advantage over a normal Postgres connection is the overhead to start up and connect to a new Postgres backend. Thus, the PgBouncer connections are only returned to the pool upon disconnection. The only way to tell if you are in this mode is by determining that you are *not* in the other two modes. 🙂
So, although PgBouncer is extremely transparent, there are some tricks to determine if you are connected to it, and at what pool_mode. If you can think of other (SQL-level!) ways to check, please let me know in the comments section.

without restarting the server-drop all connections in postgreSql

November 25, 2016

By postgresdba

I want to drop all connections (sessions) that are currently opened to a specific PostgreSQL database but without restarting the server or disconnecting connections to other databases.

Method 1.
Depending on your version of postgresql you might run into a bug, that makes pg_stat_activity to omit active connections from dropped users. These connections are also not shown inside pgAdminIII.
If you are doing automatic testing (in which you also create users) this might be a probable scenario.

In this case you need to revert to queries like:
SELECT pg_terminate_backend(pg_stat_activity.procpid)
FROM pg_stat_get_activity(NULL::integer)
WHERE datid=(SELECT oid from pg_database where datname = ‘your_database’);

Method 2.
The query like this should help (assuming the database is named ‘db’):
select pg_terminate_backend(pid) from pg_stat_activity where datname=’db’;

pid used to be called procpid, so if you’re using a version of postgres older than 9.2 you could try the following:
select pg_terminate_backend(procpid) from pg_stat_activity where datname=’db’;

However you have to be a superuser to disconnect other users.
It might also be useful to REVOKE CONNECT ON DATABASE FROM PUBLIC or something similar, and then GRANT it afterward.

Method 3.
This can be used to “free” a database from client connections, so that you for example can rename it:
SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE datname=’current_db’;
ALTER DATABASE current_db RENAME TO old_db;
ALTER DATABASE new_db RENAME TO current_db;

Be aware that this might cause problematic behaviour to your client apps. Data actualy should not be currupted due to using transactions.

Method 4.
$ ps aux | grep DATABASE_NAME
51191
postgres: user DATABASE_NAME [local] idle
then kill the process

$ kill -9 51191

pg_cancel_backend vs pg_terminate_backend in postgresql

November 25, 2016

By postgresdba

1.To test this functionality we will take a look at the below 2 sessions (one IDLE and one running. )

gpadmin=# select datname,usename,procpid,sess_id,current_query from pg_stat_activity ;

datname | usename | procpid | sess_id | current_query
———+—————+———+———+——————————————————————————
gpadmin | running_user1 | 24174 | 26 | insert into test_table values (generate_series(1,100000000000));
gpadmin | idle_user1 | 24285 | 27 |
gpadmin | gpadmin | 23494 | 21 | select datname,usename,procpid,sess_id,current_query from pg_stat_activity ;

3 rows)

2.pg_cancel_backend()
pg_cancel_backend (procpid from pg_stat_activity ) should be used when query is running, the function will not impact when the session is IDLE.

gpadmin=# select pg_cancel_backend(24174) ;

pg_cancel_backend
——————-
t

(1 row)
note:- Canceling the query may take some time depending on the cleanup/rollback of the transactions.

3.pg_terminate_backend()
pg_terminate_backend (procpid from pg_stat_activity) should be used for terminating IDLE sessions, avoid using them on an active query or where the session is not ..
gpadmin=# select pg_terminate_backend(24285) ;

pg_terminate_backend
———————-
t
(1 row)

4.State of pg_stat_activity after running the above two commands:
gpadmin=# select datname,usename,procpid,sess_id,current_query from pg_stat_activity ;

datname | usename | procpid | sess_id | current_query
———+—————+———+———+——————————————————————————
gpadmin | running_user1 | 24174 | 26 |
gpadmin | gpadmin | 23494 | 21 | select datname,usename,procpid,sess_id,current_query from pg_stat_activity ;
(2 rows)

The IDLE session that you have witnessed again above is after we have cancelled the query through pg_cancel_backend, the query has been cancelled by the function but the user session still is connected.