Zero downtime Postgres migration, done right

A step by step guide to migrate your Postgres databases in production environments with zero downtime

Zero downtime Postgres migration, done right
UPDATE: We consider this to be the best solution for PostgreSQL <=  9.x
Better alternatives, that utilize logical replication, exist for PostgreSQL 10+

Summary

In this blog post we describe how to migrate a Postgres database to a new instance with zero downtime using Bucardo. We will describe how to avoid common pitfalls like data loss, deteriorated performance and data integrity failures. We have successfully used this process to migrate our Postgres databases from version 9.5 to 12.5 on RDS, but the process isn’t restricted to RDS only, and does not depend on anything AWS specific. This migration strategy should be possible with any self-hosted or managed Postgres.

Analysis

In this article we will talk about migrating multiple web applications, like microservices, from one database to another. Modern architectures consist of multiple apps (or micro-services if you prefer), while each app has more than one running instance to facilitate scaling. In order to be able to move your apps to a new database, you must first make sure that data in both databases are in-sync, and remain so at any given point in time, otherwise your clients will end up losing data, or even end up with an invalid state.

A naive solution for this is to stop writing to the old database, take a snapshot, restore it to a new database, and resume your operations there. This scenario includes an unacceptable amount of downtime, and is unsuitable for a production environment. We only mention this for educational purposes, since it is the easiest way to ensure that you won’t lose any data, but you may lose a few customers.

A more realistic approach is to set up a near real-time bidirectional replication between the two databases so that, ideally, apps can read from and write to both without noticing any difference. This approach will allow you to gradually move your apps one by one, one instance at a time, without downtime and without affecting your users. Since we want our applications to be able to write to both databases, we need to go after multi-master replication. A Google search for “multi-master replication in Postgres” will present you with plenty of solutions, each one with several advantages and disadvantages to consider. We decided to proceed with Bucardo because it is open-source, fast, and provides easy monitoring and a conflict resolution mechanism.

How Bucardo works

Bucardo acts as a middle man between two Postgres instances. You can have Bucardo running on any machine you like, as long as it can reach the source and the target databases. Once you install it and set multi-master replication, Bucardo will add some extra triggers to all the tables you have selected to replicate. The instance that you run Bucardo on utilizes a separate Postgresql database locally in order to save the synchronization state. This way you can pause and restart synchronization at will. When a change occurs, a trigger will add all affected primary keys to a “delta” table in the Bucardo instance’s Postgres, and another trigger will “kick” the sync. Each time the sync is kicked, Bucardo will compare the affected rows of each table for all masters and pick a winner, then it will sync the changes to the rest of the databases. Picking a winner is not straightforward, and a conflict can occur at this point. (We analyze conflicts in a different section.)

Bucardo initial schema

Beware of the drift

Several guides online suggest that the proper way to use Bucardo is to take a snapshot of the source database, restore it to a new one, and then kick-off a multi-master Bucardo sync. Don’t do that! If you do so, you are about to get a data loss proportional to your current database size and write traffic. This is because taking the snapshot and restoring it requires a significant amount of time. During that time frame the source database will start to drift as writes occur, and this drift must be synchronized as well in order to ensure that both masters contain the same data. The catch here is that people believe that Bucardo will do some kind of back-filling, but it turns out that it is unreliable at this task and will probably fail to sync big drifts. You can always use a tool that compares data across databases in order to ensure that drift has been eliminated, but if the datasets are huge, a lot of time will be wasted and we can’t afford that if zero-downtime is what we are looking for. Additionally, with enough replication lag, you could end up with in-flight sync detected as false positive drift.

Drift showcase

How to synchronize drift

You can start Bucardo sync and tell it to cache all drift in its local database by using the autokick=0 flag. Unfortunately, this option is critical but is not documented! This step is critical and to our knowledge the only explicit reference happens in this excellent blog post by David E. Wheeler.

Note autokick=0. This ensures that while deltas are logged, they will not be copied anywhere until we tell Bucardo to do so.

What you achieve by using this flag is that you can cache the delta in the Bucardo instance locally until you have enough time to prepare your new database. This is critical, especially for big drifts.

Bucardo showcase

How to bootstrap the new database

You have two options here. You can either take an all-inclusive snapshot from your first database and restore it to a new instance, or you can start with a new empty database and transfer users, schemas, and then data - in that order - separately. We recommend the latter approach. The reason is that after benchmarking both solutions, the second one was cleaner. We had the chance to dismiss old user accounts, temporary tables and fine-grain the user permissions from scratch.

In case you are using AWS RDS, the proposed solution is faster too. Taking snapshots can cost you several minutes, depending on your database size. Additionally, if you are migrating from an unencrypted server to one that uses encryption-at-rest, like we did, you need to take a snapshot, encrypt the snapshot, and then restore it to a new RDS instance. This was way more time-consuming, and minimal migration time was a key goal for us.

Selective synchronization

Before you start Bucardo sync you need to configure it properly. You need to specify the two databases, their type (master/replica), and what parts of the database should be included in the sync. You can bulk add all tables from a schema, which is very helpful for databases with lots of tables.

Bucardo can not synchronize tables without a primary key (PK), which is totally expected, since it couldn’t distinguish unique entries otherwise. We had to exclude some tables in the process that acted as a cache for various table migrations and didn’t include a PK. Some unused tables were also excluded so we didn’t transfer unused data to our new database. Bucardo supports this easily: after you add all tables, you can remove the tables you want to exclude.

Migrating users

Bucardo won’t migrate Postgres users. You need to transfer your user accounts manually. We wrote a script for this purpose that you can find here. The script goes to the new database, creates the new users with their passwords retrieved from a configuration server, and then sets their permissions. Although you might not keep your datastores as-code, it’s a good practice to keep their users as-code in order to be able to restore them in case of disaster.

Migrating schemas and data

You can use Postgres and its pg_dump/pg_restore tools to transfer your schemas and data. This is an easy step, with a catch. Remember that at this point we already have Bucardo up and running to record drift, so restoring data on the target server would be interpreted as changes to sync back to the source database. This is why we need to restore data onto the target Postgres database using a replica session, by enabling the session_replication_role=replica flag. We will need to disable it before we kick-off your continuous synchronization.

Conflicts

High availability, a prerequisite for a zero-downtime migration, typically mandates that each application has more than one running instance. Typically each one should be drained before restarted, so there is no way to switch all instances to the new database at the exact same point in time - if there is such a thing anyway. So there will always be a critical - shorter or longer - time window in which the same application will write to both databases, and it is during that time when conflicts can occur.

Conflicts are rare, since they require two writes to take place in both databases before Bucardo can replicate those two records across. With a near-zero replication time, chances are you won’t have any conflicts at all, but this migration is happening in a critical production environment and so they can’t be ignored.

Imagine two clients trying to book the same house for the same date. If they both try to do so at the exact same time and each one is pointed to a different database, a conflict may occur. Bucardo comes with a conflict resolution mechanism that gives you two basic options: either let Bucardo handle conflicts automatically (the default option), or abort the sync and resolve them manually. This is the most critical part of the migration process, so let’s analyze it further.

If your table has an auto-incremented ID as a primary key, Postgres will automatically pick the next ID from the corresponding sequence. Bucardo syncs sequences as well. Assume that in the above example you have a bookings table with an auto-incremented ID as a PK, and that the latest record ID is 42. Concurrent inserts happen and two different records are created in both databases, both having 43 as a PK but different data. If you let Bucardo handle the conflict, it will keep only the latest one and delete the other. You will end up with a missing booking that appeared successful for your client. Your database will be in a valid state, but you will have lost data that is unable to be recovered. This is a dead end!


Before jumping into the solution, let’s consider another scenario. Assume that your table uses UUIDs as a PK. Replaying the above scenario with the concurrent bookings would create two distinct records with two distinct PKs in the two databases. This time no conflicts occur. Bucardo will successfully sync both records in both databases, but your data would still be invalid from the business perspective, since you can’t book the same house twice. So it is clear at this point that database validity does not guarantee you valid data from the business perspective. You need to be careful how you handle conflicts so your customers won’t experience problems down the road.

Bucardo supports custom resolution strategies. You could program your own strategy based on your business needs, but this can quickly get too complex and too time consuming. Another approach is to create your own tools to detect and resolve data violations during the migration. This is no easy task: it must be designed based on the complexity of your data, and it will likely require too much development effort.

Our solution was to make sure that conflicts won’t happen at all by achieving two conditions before beginning migration. First, we worked to minimize the transition time between the databases in order to minimize conflict probability. This was achieved by scripting the reconfiguration of the apps to point to the new database, one instance at a time, but for all the different apps in parallel. The second, and most important step, was that just before we started switching apps to the new database, we revoked write permissions from the app users in the old database. This way we could be sure that conflicts would be impossible, with the trade-off of a small amount of time where a percentage of database writes would fail. This - of course - requires your applications to gracefully handle failed database writes. Your applications should do this independently from any database migration, since this is critical for production environments.

This is what the final migration plan looked like:

migration plan

Implementation

In this section we will present the steps we followed and the script each one of these corresponds to. We have uploaded our code to this GitHub repository, which we will break down into pieces below.

Preparation

  1. Spin up a new instance (EC2 in our case). The instruction will assume you run Debian OS.
  2. Run install.sh to install Bucardo
  3. Edit vars.sh to set up your databases and postgres roles passwords
  4. Export the above vars in your shell: $ source vars.sh
  5. (Optional) If you have previously used Bucardo in the source database, you may need to clean the old triggers by running the uninstall_bucardo.sh. Before you run it, take a look at the uninstall.template that we generated based on our database. You need to list all your tables there.
  6. You need to manually run $ bucardo install in order to complete local Bucardo installation.

Migration

  1. Take a good look at the configure.sh script. Here you will need to edit the script in order to match your migration scenario. You need to define descriptive names for the Bucardo objects and specify your excluded tables or omit this option. After you understand what this script does, you can proceed to run it. The script does the following:
  • Sets up .pgpass file and a Bucardo alias command in order to avoid interruptions from interactive prompts that ask for your passwords during the process
  • Configures Bucardo databases, herds, database groups and sync. If you need to get more familiar with the Bucardo object types, there is a list in their documentation page.
  • Initializes an empty database in the new Postgresql host and creates the users by running this script. You need to edit the script to specify your roles here. Passwords are retrieved by the vars.sh file we sourced earlier.
  • Transfers the database schema only, using pg_dump and transfers it to the new host
  • Starts Bucardo sync with local caching in place
  • Transfers the database data in a compressed format. While data is transferred and drift starts to pile up, Bucardo will save it locally and replay it once the autokick flag changes value
  • Resets the value of the autokick flag in order for the local caching to stop and then reloads the configuration in order for the sync to respect the new value
  • Starts the multi-master sync
  1. Now that the continuous sync is in-place, it is time to start moving the apps in the new database. For us, this was done by changing the application parameters in the configuration server and then redeploying them one by one. During this step, we needed to make users’ permissions read-only in the old database. Once the first instance of our apps connected to the new database, we changed the permissions in the old one by running the revoke_write_access_from_old_db.sql script. Timing is critical at this point.

Post-migration checks

  • While you have sync running you should verify data replication. We did it by using the forking pgdatadiff tool. We also extended it by allowing the data diff to exclude tables.
  • After you have switched all apps to the new database, you can stop the Bucardo sync and dismiss its machine. You should once again run uninstall_bucardo.sh in order to clean-up your new database from the triggers.

Conclusion

Migrating your postgresql database to a new instance comes with great challenges. No matter the tool you chose to do it, the challenges you will face remain the same:

  • Transferring data
  • Setting multi-master replication between both databases
  • Handling conflicts and ensuring data consistency from the business perspective
  • Validating sync process
  • Eliminating downtime to avoid disrupting your clients

In this article we describe how we solved them. One of the major difficulties we had was the absence of a similar tutorial online, so we had to improvise and iterate through our solution many times until we did it correctly. Your feedback is needed to improve the process and make it valuable for others who might face the same problems.

Conversation on Hacker News Conversation on Reddit

PS: The background story

The beginning of 2020 found us using two Postgres 9.5 instances that we had been using since the very early days of Blueground. In January 2020, we had to dismiss old instances and use new ones because Amazon was about to migrate to new SSL/TLS certificates. This migration ended up with a recoverable data loss for us, which cost us a few hard days to fix. What went wrong was that we trusted Bucardo to sync our drift automatically, which - as mentioned - was buggy and failed. This year, we had to do it again, because of Postgres 9.5 EOL date, or else they would have been forcibly upgraded by AWS. This time we were determined to pay attention to every little detail. We believed we could do it fast, reliably, and fail-safe, and we did it.

Why upgrade to new instance

First of all, we need to explain why we didn’t let Amazon upgrade our databases on-the-fly, without us intervening at all. Amazon offers an upgrade process, but it has some serious disadvantages when compared to migrating to a new database instance:

  • AWS RDS does not offer you an instant rollback option. With two instances during the migration process, rollback would be a simple reconfiguration of our apps to point to the old database. This was a huge fail-safe measure during the whole process.
  • Transparency. If RDS had failed to upgrade the database, had delays, or performance issues, we couldn’t do any action at all. In production environments you need to have a solid rollback plan in case anything goes south.
  • Some features that we wanted were not available in the current instances, like encryption-at-rest and RDS insights.
  • In some cases we needed to change the instance type.

We went with Bucardo because we wanted a solution sandboxed within our VPC, so production data never gets exposed on the Internet. The result was a successful migration without the risk of losing data. The migration process had a total duration of less than 2 hours, which was considered a relative success!