Some web applications have large tables containing records that only need to be kept for a window of time. For example, maybe you keep a record of webhooks sent to external parties, but you only care about keeping those records for 30 days.
To prevent those tables from growing indefinitely in size, you might setup a periodic task to clean out old records.
However, mass amounts of deletes from PostgreSQL tables actually causes a lot of work for the database, both for the deletion, as well as the subsequent vacuuming that occurs. Additionally, deleting rows does not actually return space to the operating system — under most circumstances, PostgreSQL will hold on to that space for future rows.
One strategy for keeping a rolling window of data without putting a strain on PostgreSQL is partitioned tables.
Partitioned tables are essentially multiple separate tables with the same schema that PostgreSQL knows how to query separately and combine the results.
In the rolling window example, you could define a separate partition for each of many “ranges” of ids, for example 1-100,000,000, 100,000,000 to 200,000,000, 200,000,000 to 300,000,000, etc. Then, which all of the records in the oldest partition are ready to be deleted, you just drop the partition. This immediately returns space to the OS, and it eliminates the overhead of individual row deletions and vacuuming.
One cool thing about partitioned tables is you can seamlessly integrate this approach into an existing application even if you already have a table with a ton of data. You can use the “old” table as the first partition, and add new partitions for future records.
Tips for using an existing table as the first partition:
- Make sure the table already has the indices that you intend to define on the partition table.
- Make sure to add a check constraint to the existing table proving that it only contains rows in the given range (e.g. 1 to 100,000,000.)
- When you add the check constraint, add it as NOT VALID, and then later alter the table with VALIDATE CONSTRAINT.
These three strategies result in minimal locking, which allows the seamless behavior.