solovyov.net

Translating in-browser applications

5 min read · i18n, javascript

Earlier or later, if an application is not targeted on an English-speaking market exclusively, you’re going to work on localization. Such a moment happened for us a few weeks ago: when there are no more excuses left and something had to be done. :)

And I started seeking for a library, which I could use for translations. Main requirements were jQuery-less-ness, GNU gettext compatibility, compact implementation, support for pluralization, and overall ease of use.

I found some. Some had dependency on jQuery (you can’t even calculate 2 + 3 nowadays without it), some haven’t been updated for many years and I had a lot of doubts about their workability, some had other issues. Closest to a winning library is called Jed.

It’s not bad, but I didn’t like its size (more than a thousand lines), obscure relationship with jQuery and lack of the infrastructure.

Obviously, I couldn’t live with those outrageous problems and I had to write a new library. It’s called puttext and has API which looks completely unlike gettext (I don’t think anybody is worried about that).

Little digression. Among the features of gettext I didn’t implement domains (it’s easy to solve with separate files in my opinion) and contexts (I’m still thinking about them). If you don’t know what’s that, there is no need to worry, everything’s fine. :)

What’s that

It’s a small (version 1.0.1 is only 1713 bytes after UglifyJS), but very useful (natural modesty makes it impossible to rephrase) library for translating static text.

Main problems: which string to translate, how to change position of variables inside of this string, and what to do with plural forms.

Which string should be translated

That’s simple:

__('String to translate')

If you have a translation for a string, you will get it instead of this string. If you don’t have a translation, you will receive original string.

What about variables in strings?

Nothing complex again, just some simple syntax for variable substitution:

__('{name}, how are you?', {name: document.body.tagName})

If you want to output a curly bracket, escape it: __('\\{ - works!') - two backslashes because substitution function needs two symbols - a backslash and a curly bracket.

How to handle plurals

When you have a string with a number inside, you might want to change it based on an actual number given.

You have to have two strings in English (for singular and plural cases), a number to choose between strings and a context for string formatting. Context is not necessary, of course, if “one/many” string output is enough for you. :)

__('one item left', '{num} items left', n, {num: n})

Of course, you have to think a bit about pluralization in other language. For that gettext has a special header string Plural-Forms and GNU gettext documentation has all possible cases.

Infrastructure

Extracting translatable strings

Having client library is good, but you have to have translations prepared beforehand. Distribution contains xgettext.js for extracting strings from JavaScript files. By default it looks for all strings which are wrapped in __ function call (e.g. __('string')). There is a possibility to change behavior by supplying one or more wrapping function names:

$ ./xgettext.js -m __ -m env.__ path/to/source

xgettext.js exists because xgettext from GNU gettext distribution does not support JavaScript syntax, and the most recommended one for the task - Python syntax - often fails.

Creating source translation file

xgettext.js attempts to avoid replication of functionality of GNU gettext suite, and for full-fledged usage of gettext we have to prepare .po file with a header:

$ echo '' | xgettext -C --force - -o -

This will output a draft for a .po file, where you have to change charset (to UTF-8) and Plural-Forms to what your target language needs.

Moving forward

In addition there is a shell script i18n-collect, which automates string extraction and merging new and translated strings. It’s very simple and I think that the best way is to copy it to your code and change if you need some customization.

Can I haz JSON

In the end, you have to compile .po to JSON to use it in a client-side application. po2json.js is included, so it’s not a problem:

$ ./po2json.js [path/to/lang.po] > [path/to/lang.json]

Why do all this

Why bother with .po files and all this stuff, when it’s possible to extract strings from JS and write them in JSON, which then translators would edit? Because in that case you would have to write merge logic and you won’t have all that delicious gettext infrastructure, like PoEdit, Emacs’ po-mode, and other editors. Being compatible with de-facto standard is simply convenient. Even if you spend a bit of time once for configuration.

What’s next

Get the necessary files from repository or install puttext package from NPM, add puttext.js to your application, mark all strings, generate source <lang>.po, send it to translators (well, you could translate yourself, if you know any other languages - it’s useful to check if all strings are marked), and then compile translated file to .json, which you can load based on user settings.

The project has a README, which contains API description, but if that isn’t enough, ask questions here, or on Github, or mail me. Enjoy!

Just in case you grasped idea from the beginning of this post and skipped rest, link to repository: puttext.

If you like what you read — subscribe to my Twitter, I always post links to new posts there. Or, in case you're an old school person longing for an ancient technology, put a link to my RSS feed in your feed reader (it's actually Atom feed, but who cares).

Other recent posts

Server-Sent Events (SSE), but with POST
ngrok for the wicked, or expose your ports comfortably
PostgreSQL collation
History Snapshotting in TwinSpark
Code streaming: hundred ounces of nuances