(This post assumes a PostgreSQL installation located at /var/lib/pgsql on a Red Hat-type Linux system such as Red Hat Enterprise Linux or CentOS. If your system differs from this, you may need to modify some of the paths accordingly.)
In PostgreSQL, sorts larger than a certain size will get performed on disk instead of in memory, and this makes them much slower as a result. Ideally all sorts should be done in memory (except for the ones that are genuinely too big to fit into your available RAM, because swapping to virtual memory should be avoided at all costs).
Seeing which sorts are happening on disk
To see which sorts are being performed on disk, the following parameters need to be enabled in /var/lib/pgsql/data/postgresql.conf:
log_destination = 'syslog'
log_temp_files = 0
PostgreSQL syslog logging goes to the local0
facility by default, so in /etc/syslog.conf make sure that you have local0.none
added to the line for /var/log/messages, and add the following line:
local0.* /var/log/pgsql
Then restart syslog and reload PostgreSQL:
service syslog restart
service postgresql reload
You should now start seeing details of your disk-based sorts in /var/log/pgsql. Each one will start with a line containing the phrase “LOG: temporary file
” and will show other details below this line including the SQL of the query which contained the sort. After a while you will see patterns emerging.
Making sorts happen in memory instead of on disk
If you have disk-based sorts occurring frequently, you can find the size of the largest of the frequent queries and then set this as the standard threshold for sorts. The parameter for this in /var/lib/pgsql/data/postgresql.conf is work_mem
. For example, if you had frequent disk-spaced sorts using up to 64MB, you would set the following in /var/lib/pgsql/data/postgresql.conf:
work_mem = 64MB
Be very careful, however, because this setting gets used by each query in each thread, so if you set this too high you could rapidly run out of RAM and cause the system to start swapping, which would be disastrous. The best thing to do is increase this bit by bit until as many as possible of your frequent queries are sorting in RAM but without running out of available memory. To see how much RAM is actually available on your system, use the command free
. As Linux tends to use up the majority of available RAM for its disk cache, the number you actually care about is the free
column in the buffers/cache
row – this is how much RAM is really available. For example, here we can see that there are just over 11 GB of free RAM available:
# free
total used free shared buffers cached
Mem: 12300988 12013536 287452 0 277788 10801424
-/+ buffers/cache: 934324 11366664
Swap: 4192956 240 4192716
Once you’ve dealt with sorts for frequent queries, you might find there are still the occasional disk-based sorts happening for a small number of particularly intensive queries. For those, you can increase the sort memory threshold on a per-query basis and then put it back to the standard setting once the query is finished. To do this, prepend your query with:
SET work_mem = '500MB';
changing 500MB
to whatever size is appropriate for you, then append the query with:
RESET work_mem;
to return it to the standard threshold. Again, be careful that you don’t use up all the system’s available RAM because then the dreaded swapping will occur.
Once you’ve done all this, as many sorts as possible will be happening in memory instead of on disk, and your PostgreSQL installation should be performing considerably better as a result of this tuning.
Getting information from the logfile into a more useful format
When PostgreSQL logs temporary files (disk-based sorts), the log entries can be rather messily distributed in the logfile, especially if you’re logging other things, such as slow queries, at the same time. I knocked up the a Bash script to quickly extract the details of the sorts into a more useful format. Grab it from GitHub or copy and paste it below:
#!/bin/bash
PROG=$(basename $0)
PPROG=$(echo $PROG | awk -F '.' '{print $1}')
TMPFILE1=/tmp/$PPROG.tmp1
TMPFILE2=/tmp/$PPROG.tmp2
LOGFILE=$1
if [ -z "$1" ] ; then
echo "Usage: $PROG LOGFILE"
exit
fi
cat /dev/null > $TMPFILE2
grep "temporary file" $LOGFILE > $TMPFILE1
cat $TMPFILE1 | while read LINE ; do
NO1=$(echo $LINE | awk -F '[][-]' '{print $2}' )
NO2=$(echo $LINE | awk -F '[][-]' '{print $4}' )
cat $LOGFILE | awk -F '[][-]' "($2 ~ /$NO1/) && ($4 ~ /$NO2/) {print $0}" >> $TMPFILE2
echo "" >> $TMPFILE2
done
cat $TMPFILE2
rm -f $TMPFILE1 $TMPFILE2