Zero Second Downtime

I hate waiting.

I especially hate waiting for programs to compile when I want to make a small change, which is one of the reasons why I like web programming so much. While others would be waiting for their programs to compile, I could maintain my groove and concentration by immediately refreshing the browser to check my work before continuing.

Node being pretty damn fast means I don't have to wait all that long either, although there's no getting around a lot of the pre-processing that needs to be done before a NodeBB instance is "ready".

During a typical start-up, the following is done:

  1. Routes added, other basic app startup
  2. Third-party client-side scripts are minified
  3. Third-party LESS and CSS assets are minified
  4. Node.js server begins listening.
  5. Socket.IO server is started and begins listening/re-establishing connections

Much of that is done sequentially (due to the single-threaded nature of Node), which means the more third-party assets you have installed, the slower your NodeBB will start up.*

I'm making it my quest to reduce this waiting time as much as possible, so now:

  • When the app starts, we fork the minifier process so that it can chug along a separate thread instead of blocking the main thread.
  • If a plugin only makes minor changes and does not require a full restart, capability exists to reload the plugins.

For the former, in single-CPU systems, this doesn't end up doing much, but speedups can be seen in multi-core setups.

As for the latter, plugin reloading ended up not working well, as I discovered soon afterward that the application middlewares were quite difficult to manipulate after the app had already started.

But we can do more...

1. Only restart when absolutely necessary

First off, we could eliminate instances where a restart is not even necessary. A complete restart starts from scratch, killing all connections, and necessitating a complete rebuild of all front-end assets. For forums with large userbases, even a one minute downtime is too great, and combined with the fact admins of other boards are used to downtimes of several hours when it comes to server restarts, I set out to reduce this as much as possible.

2. Start another NodeBB in parallel, and switch over when it's ready.

As mentioned in the title of this blog post, this nets us zero second downtime. Neat-o. Follow along with the issue here.

3. Eliminate restarts completely.

This is the white whale**. There are several problems that need to be resolved before this can even be a reality:

  1. The client-side assets (js/css) need to be able to be minified on-demand instead of on-start (as mentioned in #1 above)
  2. The routes need to be refreshed, so stale routes are removed and new routes can be added.

I'm glad to say that I did end up figuring out the second problem, and NodeBB now comes with a "Reload" functionality in addition to the existing "Restart":

The results?

  1. Reloads are now transparent and do not kick existing connections off. Admins used to be afraid to swap plugins because a server restart incurred downtime. Not anymore!
  2. Occasionally, incompatible plugins break NodeBB. The new reload system will catch errors as they occur, display them to the admin, and continue serving the old assets, meaning NodeBBs will no longer be left dead in the water when a plugin breaks.
  3. End users won't notice a reload, as connections are maintained. Restart downtimes are reduced to near-zero, as a parallel instance is started and seamlessly switched over when ready.

These changes are due to be merged into the master branch for inclusion in NodeBB v0.5.1.


* For the sake of argument, we'll assume that "buy a better CPU" is out of the question.
** See Moby Dick

Julian Lam

Read more posts by this author.

Toronto, Ontario