Locali(s|z)ation in NodeBB and other FL/OSS projects

Years ago, when I first got started in the software development industry, I had very little experience with open source software. I was vaguely aware of the concept, and I was aware of older terms like "shareware" and "freeware", but the entire ecosystem of FL/OSS was alien to me, so much so that RHEL meant nothing, and "Red Hat"1 sounded like something that would go great with a red suit.

As I started becoming more adept (read: less useless) at developing, I slowly become exposed to more and more libraries containing code that I could adopt, frameworks that made it easier to work with programming languages, huge repositories of code that contained many thousands of man hours -- all for free!

I did want to contribute back. I wholeheartedly believed that crowdsourced2 content (i.e. Wikipedia) was superior due to wisdom of the crowds, and I spent many hours contributing and editing Wikipedia articles.

As a software developer, however, there was always this barrier to entry that I found difficult to overcome. My desire to contribute back to a piece of software was overshadowed by a looming case of Impostor syndrome, combined with the large codebases of already established projects.

  1. How could I even begin to comprehend the code base in a manner that would allow me to contribute something back?
  2. Conversely, if a project had something I could help with, there were often additional barriers: style guides, contribution notes, (im)proper IRC etiquette, etc...
  3. Then there's the source code -- where is it, anyway? Remember, GitHub was only founded in 2008! svn was all the rage, and I remember being afraid of the command line. I needed TortoiseSVN to feel comfortable with version control. Sometimes code was on SourceForge, other times it was distributed via .tar.gz archives which I couldn't easily open in Windows

For many, it starts with a single step

The world contains upwards of 7 billion people. For the vast majority, English is not their primary language. In a Canadian 2011 census, an unexpected 56.9% identified English as their mother tongue. With that many people speaking another language, it is no surprise that localisation3 of NodeBB become an oft-repeated feature request very early on.

The early adopters were fellow developers, and frankly, that was expected, as NodeBB was a fairly technical product at the start. However, as we merged more and more languages into NodeBB's core, I kept getting the feeling that the barriers to contributing a translation were still too high. To contribute, you would have to be familiar with:

  • git and GitHub (at the time, GitHub's in-browser editing was not available)
  • JSON syntax
  • Editors that saved files as text/plain4
  • Node.js/Redis familiarity to run NodeBB
  • Linux system administration skills and command line familiarity to install NodeBB

The proportion of people who want to contribute is already miniscule, you can imagine how much tinier we sliced the pie with those (unspoken) requirements!

Transifex came to our rescue with a slick web app that allowed anybody to contribute translations. In a matter of weeks, Transifex turbocharged our localisation efforts, and we added twenty new languages between v0.3.1 (when Transifex was first integrated) and v0.4.2.

By removing (or minimising) the barrier to entry, we allowed many more contributors access to NodeBB, and that's something we can all get behind. Sometimes we just need to dip our toes in the water before diving in, and contributing (or editing) translations is a great way to get started.

We learned an awful lot about our users when we opened up translations. I honestly had no idea we had so many Chinese, German, and Portuguese speakers (among many other languages), and I am very proud of the time and effort these contributors have devoted to making sure our localisations are perfect.

Fast forward to today

We now stand strong with forty-one languages maintained in core. Thirty-eight, if I'm being honest, if you exclude the source language (UK English, en_GB), US English (en_US), and the localisation we introduced in jest, Pirate English (en@pirate).

Can we, as project maintainers, do better?

Since the internationalisation effort was introduced, the actual steps to synchronising translations were handled manually by myself. It wasn't particularly time consuming, nor was it mentally taxing. It was periodically running these simple commands:

$ tx push -s  # push source files to Transifex
$ tx pull # grab the latest translations and fallbacks from Transifex
$ git commit # etc...

The main problem was that it was manual. It made the feedback loop unnecessarily longer, and allowed these preventable things to happen:

  • Whenever a new language string was added, it would need to be pushed to Transifex in order to allow our translators access. If we forgot to push, or didn't push often enough, people installing NodeBB would stumble on untranslated language keys instead of a Transifex-provided fallback (e.g. motd.welcome instead of Welcome to NodeBB, the discussion platform of the future.)
  • Translators would provide localisation strings via the Transifex project, but they might not be pulled in in a timely manner, so they would be left wondering when their work would be showcased.
  • If I were busy with commissioned work or a particularly large feature addition/bugfix, we'd go weeks without either a push or a pull.
  • Whenever a helpful developer would contribute new strings via pull request, we would have to direct them to Transifex. It wasn't a problem at the start, but it does get repetitive, and I've become aware that I am less polite than I ought to be when addressing these pull requests.

As of today, we're proud to announce that we're better supporting our translators by automating our translation handling through Misty, our release manager bot:

  • Whenever any contributor (both by maintainers and from commits made and merged in via pull request) modifies a source language file5, Misty will immediately push that resource to Transifex, allowing localisation to begin as soon as possible.
  • Every morning at 9am Eastern Time, Misty will pull the latest translations in from Transifex and commit them into the master branch of NodeBB.
  • If a contributor accidentally makes a pull request modifying non-source files in public/language, Misty will politely inform them to make their contribution via Transifex instead.

Three cheers for a quicker localisation turnaround time!

Final thoughts

While a step in the right direction, there are still improvements that could make localisation a better experience for our translators. While we've reduced the time it takes for a language contribution to make its way into NodeBB, there are still the hours until the next 9am pull where a raw translation key may be encountered.

Secondly, whenever a source string is modified, even a little bit, the entire localisation is discarded by Transifex. I understand that the simplest (and most foolproof) assumption is to discard the translations, although at times it may undo the work of our translators, and at worst, it may even disincentivise them from contributing. A source modification from "Welcome Back" to "Welcome Back " (adding a space at the end) shouldn't cause all existing translations to become stale.

Perhaps I am wrong about the seriousness of that last issue, as I don't contribute to the translation effort myself6. If perhaps Transifex displayed the old translation and allowed you to mark it as "current" again, that would be enough, I'd think.

Did you know we have a user group on our forum just for translators? You can mention them by writing @translators in a post, and tell them what a fantastic job they're doing!

1 Was Red Hat's logo always a Fedora? Come to think of it, why isn't Fedora's logo a fedora?

2 Though technically, "crowdsourcing" wasn't a term yet...

3 By the way, if you're ever confused about the difference between "localisation" and "internationalisation", check this article out. tl;dr NodeBB was internationalised, contributors localised NodeBB into non-English languages.

4 I will point out I have never received a localisation contribution with the .txt file extension, but one wonders...

5 These files are located in public/language/en_GB

6 I speak and write in English, and while I speak and understand colloquial Cantonese, I am unfortunately unable to read or write it at a near native level!

Cover photo credit Negative Space

Julian Lam

Read more posts by this author.

Toronto, Ontario