Reflections on 4 years of upgrade scripts

The first commit to NodeBB was nearly four years ago, and in that time, many changes have been made to the core code itself, from feature additions and bug fixes, to bundling of must-have plugins for all installations.

As with any code that matures, schema changes needed to be made over time to ensure that stored data was kept in an ideal manner, so as to reduce the use of anti-patterns such as God tables and code smell. The second reason schema changes are made are due to revisions in the original implementation. Perhaps a design decision from before could have been done in a more efficient way, and that may need a migration of active data from one data type to another (e.g. a list to a sorted set).

Unlike relational database management systems, we do not define the schema upfront (with table definitions and such). Our schemas are constructed on the fly, as they are saved and/or referenced. We were able to get away with not handling schema changes at the start, by simply flushing the database (and all of the content!) away and starting fresh, but as soon as we ran our first production instance (that is, our community support forum!), we realised we couldn't avoid it any longer.

What we have now

Our first (and currently, only) method of ensuring schema updates was to maintain a file src/upgrades.js containing the scripts necessary to migrate data stored in old formats to new. Every time a schema migration was run, it would update the schemaDate value in the database to match the latest upgrade script, and subsequent runs would skip the scripts with a defined schemaDate in the past.

By and large, this worked well (barring some bugs here and there1), but over time, certain pain points kept cropping up:

  1. Over time, as we amassed upgrade scripts, the src/upgrades.js file would get overly large. In general, we (and CodeClimate) prefer smaller files. CodeClimate specifically would ding us points for having a large file, even if objectively, it was easy to read, or did not ever need reading at all. In response, we would periodically clear out old upgrade scripts, and this led to cases where users on very old NodeBB versions would not be able to update all the way to the newest version in one go.

  2. For those developers trying to write update scripts, there was a bit of a barrier to entry, as not only would you have to write the script, you'd have to know how to edit the lastSchemaDate, and ensure that the proper callbacks were executed. It's no surprise that beyond the three core developers, only 2 others have attempted to write an upgrade script!

  3. Lastly, changing the upgrades.js file was prone to errors. Along with the previous point, one could run into issues where a script might not run because lastSchemaDate wasn't correct, or removing old scripts would result in errors because there was no clear documented formula for determining when it was safe to do so2.

What we have now

The new upgrade script now handles schema updates in a simpler manner. Upgrade scripts are kept in a separate directory (/src/upgrades), and references to said scripts are stored in the upgrade script itself. On upgrade, those scripts are run if there is no record of them in the database of having been run before.

We've also added the requirement that upgrade scripts be idempotent. That is, they should not alter data if they've already altered data once. By guarding against repeated runs of schema updates, we can be sure that running scripts again (by accident or intentionally) won't irreperably damage a database.

We also record when a script has been run – an additional data point to use if things go wrong.

One of the bigger benefits of the new system is that now we won't be unduly pressured to clear out the upgrades.js file over time. With the scripts themselves sequestered safely in a separate directory, we can maintain them indefinitely.

The new behaviour also exposes the ability to run a single script at a time, allowing the core developers to instruct users in trouble to run a single script if necessary3.

What's left to do

Despite the new upgrade system being objectively superior, there are still a couple of pain points that need be addressed over time.

  1. Both systems still do not have the capability to roll back an upgrade. An argument can be made that not all updates are meant to be reversible, and so this may not be something we can easily address. As always, our advice is to always back up your database before upgrading NodeBB.

  2. It is currently not possible to upgrade a NodeBB from an extremely old version (say, v0.2.0) to the latest stable version, due to schema updates being unable to run on older dependencies4. Complicating things, there are some upgrade scripts that depend on older upgrade scripts to have run (inception, anyone?). We intend to build a system to allow a completely automated upgrade in between versions, with the appropriate package installations in between. The ideal scenario would be to allow someone to completely upgrade NodeBB by running ./nodebb upgrade and having it handle the code updates, schema updates, and all dependency changes in between version.

Some day...!

1 At the beginning, we generated the schemaDate via new Date().getTime(), not realising that that gave us the timestamp localised to wherever the server was. Nothing would go amiss until you tried moving servers to one using a different timezone setting, and then all of a sudden, upgrade scripts would run when they ought to be skipped, or worse yet, not be run at all. We now use Date.UTC().getTime();.

2 Unofficially, we kept upgrade scripts in src/upgrades.js until a new minor version was released (e.g. v1.2.1 to v1.3.0), and instructed admins to not upgrade between branches without bringing themselves fully up to date with the latest commits on their branch (e.g. v1.2.x) but this was never explicit. We then kept the v1.x.x branch after releasing version v1.0.0 instead of releasing v1.0.x, v1.1.x, etc., but kept removing upgrade scripts, meaning someone on an early version of v1.x.x would no longer be able to update to the newest commits on v1.x.x. Not so good when the error message tells you not to migrate between branches, but you didn't switch branches!

3 Prior to this, there was no easy way to re-run a script. We'd have to instruct the user to dive into the database and edit the schemaDate value to one before the script we wanted to run.

4 It's also in general a bad idea to upgrade that many versions at once, given that breaking changes abound, but hey, who am I to judge? 😄

Julian Lam

Read more posts by this author.

Toronto, Ontario