Mon, 27 Mar 2006

Ubuntu's “Langpacks” system, a solution for the OLPC project

Jim Gettys wrote about a problem regarding localisation that the OLPC project will face in the future.

The One Laptop Per Child project aims to provide the famous $100 laptop to children in the developing world. They are Free Software based, and as most GNU/Linux distributions, the bundled software will be available in a number of languages. For now, it'll only be ten or so, but as OLPC grows, the number will skyrocket... just think about the number of languages spoken only in Africa.

The $100 laptops don't have a hard drive. Instead, they have a 1 gigabyte of compact flash memory, which is enough to run the software, but it can't store that much extra data.

The most common way of internationalising and localising Free Software is to use GNU gettext, which provides an easy to handle text file format for translations, with a series of sentences and labels that the translators need to fill in in their language.

The applications ship these translations in .mo files, which are the same .PO files, in compiled binary format the gettext enabled programs can read. Each application installs one .mo file per language it is translated into. When the apps are big enough, these files can amount to several megabytes per application, which is a problem for embedded systems or projects like OLPC.

Ubuntu has been trying a different approach to the distribution of translations. Instead of packaging all the translation files with the applications .deb packages, they are stripped off from the packages, and provided by language packs.

Language packs offer the translations for all the applications and libraries of the main component of Ubuntu. This includes GNOME, KDE and many other popular applications. When you install Ubuntu, you select the main language of the interface, and the installer program will download the appropriate language pack, plus a selection of useful localisation-related packages like dictionaries, translated manuals, etc.

These language packs are generated periodically by Rosetta, a web-based translation portal which is sponsored by Canonical, Ubuntu's and's parent company. Rosetta offers a very easy to use translation infrastructure, and Ubuntu users can start translating the applications they are running with just two clicks on the application's interface.

With Rosetta lowering the barrier for people wanting to translate Free Software, Ubuntu can have, and is now having, lots of people improving the translations of not only the next version of the software, which is what translation groups have traditionally worked on, but also the version you are running at that same moment. There is no need to wait for the next version of Ubuntu to see your translations complete. Help your team translate whatever is missing, and wait for the next language pack update. Voilà!

If the OLPC project adopted the language pack scheme and Rosetta, they could install a raw OLPC laptop without translations, and only install the language packs that are needed in the target country or area. The langpacks are currently split into GNOME, KDE and “the rest”, but any derivative could fine-grain the components they wish to include. Furthermore, the system helps improving the localisation of the system after the laptops have been deployed. Just stick a USB drive to the laptop, and use your usual package manager to install the updated language packs contained in it. Or just use the Internet if it's available.

In environments where network access is completely impossible, making the availability of updated packs from a remote server a no-op, as well as online translation in the Rosetta server, other solutions could come in place. Generating langpacks from a set of local PO files should be pretty easy.