PostgreSQL and locks, and code quantity vs. code impact.

In a previous post I described how attempting to alter a table can be blocked in a busy system while waiting to get an ACCESS EXCLUSIVE lock. While the alter table statement waits, other queries that need to read from that table are also blocked — they have to get in line behind the alter table statement

To prevent negative impact to a production system, always use a reasonable statement timeout whenever altering a table.

One reason an alter table statement can be blocked is if you have lengthy open transactions that read from that table.

Therefore, it is imperative that you keep database transactions small.

Try to perform non-database work before or after a transaction. Use a transaction just for the DB changes you need to perform, not any side effects.

Example:

One system was experiencing around 3,000 database transactions that exceeded 10 seconds every hour. Within those transactions, records were read from many tables. This made it almost impossible to modify those tables with new columns.

One of the culprits was a code path that opened a database transaction and then created a large zip file comprised of various PDF files. Some of those zip files took a very long time to create.

Solution:

There was no need to create the zip file inside of the DB transaction. By moving the zip file creation outside of the transaction in that single code path, we eliminated about 70% of our slow transactions. Not bad for just a few lines of code changes!

Perhaps this is a good illustration of the value of a strong engineer — it’s not the quantity of code that matters, but the impact of the code. A good engineer knows which code to write.