Mon, 27 Mar 2006

Ubuntu's “Langpacks” system, a solution for the OLPC project

Jim Gettys wrote about a problem regarding localisation that the OLPC project will face in the future.

The One Laptop Per Child project aims to provide the famous $100 laptop to children in the developing world. They are Free Software based, and as most GNU/Linux distributions, the bundled software will be available in a number of languages. For now, it'll only be ten or so, but as OLPC grows, the number will skyrocket... just think about the number of languages spoken only in Africa.

The $100 laptops don't have a hard drive. Instead, they have a 1 gigabyte of compact flash memory, which is enough to run the software, but it can't store that much extra data.

The most common way of internationalising and localising Free Software is to use GNU gettext, which provides an easy to handle text file format for translations, with a series of sentences and labels that the translators need to fill in in their language.

The applications ship these translations in .mo files, which are the same .PO files, in compiled binary format the gettext enabled programs can read. Each application installs one .mo file per language it is translated into. When the apps are big enough, these files can amount to several megabytes per application, which is a problem for embedded systems or projects like OLPC.

Ubuntu has been trying a different approach to the distribution of translations. Instead of packaging all the translation files with the applications .deb packages, they are stripped off from the packages, and provided by language packs.

Language packs offer the translations for all the applications and libraries of the main component of Ubuntu. This includes GNOME, KDE and many other popular applications. When you install Ubuntu, you select the main language of the interface, and the installer program will download the appropriate language pack, plus a selection of useful localisation-related packages like dictionaries, translated manuals, etc.

These language packs are generated periodically by Rosetta, a web-based translation portal which is sponsored by Canonical, Ubuntu's and Launchpad.net's parent company. Rosetta offers a very easy to use translation infrastructure, and Ubuntu users can start translating the applications they are running with just two clicks on the application's interface.

With Rosetta lowering the barrier for people wanting to translate Free Software, Ubuntu can have, and is now having, lots of people improving the translations of not only the next version of the software, which is what translation groups have traditionally worked on, but also the version you are running at that same moment. There is no need to wait for the next version of Ubuntu to see your translations complete. Help your team translate whatever is missing, and wait for the next language pack update. Voilà!

If the OLPC project adopted the language pack scheme and Rosetta, they could install a raw OLPC laptop without translations, and only install the language packs that are needed in the target country or area. The langpacks are currently split into GNOME, KDE and “the rest”, but any derivative could fine-grain the components they wish to include. Furthermore, the system helps improving the localisation of the system after the laptops have been deployed. Just stick a USB drive to the laptop, and use your usual package manager to install the updated language packs contained in it. Or just use the Internet if it's available.

In environments where network access is completely impossible, making the availability of updated packs from a remote server a no-op, as well as online translation in the Rosetta server, other solutions could come in place. Generating langpacks from a set of local PO files should be pretty easy.

This is a great idea except for 1 fatal flaw... Rosetta is a closed source python application maintained by Canonical. Do you honestly think Redhat will put something as important as translations for their OLPC project into a closed source product owned by a rival Linux company? I don't see that happening anytime soon.

Rosetta and Launchpad are both closed source software, if they were Open Source, things might be different.

Posted by Jeff Schroeder at Tue Mar 28 15:33:21 2006

Now, the bad thing about Rosetta is that few people know if translations ever get merged into mainstream.

The FAQ says: "By using Rosetta, you give permission to Canonical Ltd. to publish those translations under the same licence as the software they belong to."

Where does it publish them?

Does it simply put them into language packs?

Do Canonical people commit them to appropriate CVS/SVN?

Is there any chance Rosetta stops being so cryptic for us, translators?

Posted by Alexandre Prokoudine at Tue Mar 28 16:00:32 2006

exactly, jeff shcoreder.

translations in rosetta are sometimes poorer in quality than mainline translation, not to mention duplication of work. and what is worth going upstream, does not go there.

Posted by Sven at Tue Mar 28 16:59:35 2006

Rosetta is/was a good idea, but the current implementation is problematic.  E.g. it seriously lacks QA tools, search possibilities, more fine-grained ownership/access right, etc.

Launchpad's monolithic approach is also wrong IMHO, but that's another discussion...  ;-)


BTW, Alexandre: you can export all the translations as .po files from Rosetta...

Posted by JanC at Tue Mar 28 17:59:45 2006

Rosetta's and Launchpad's problems aside, I really like the concept of separating the translations and creating language-pack bundles...  I have learned that people really dig the ability to translate applications that they use.  I guess one of the major reasons as for why Ubuntu Linux has become the distro dujour is the fact that people can experience first hand the community factor!

Cheers,

Og

Posted by Og Maciel at Tue Mar 28 18:55:05 2006

Yes, language-pack bundles are great.

But about "experiencing the community": a lot of "community translators" in Rosetta are actually making the Dutch translation worse, and we can't see who's doing this, neither can we restrict people to e.g. untranslated applications, so we'll probably have to lock it all down...  :-(

Posted by JanC at Tue Mar 28 22:56:07 2006

Rosseta needs some focus group love.
The basis is there, as team leaders/translators have high karma, so they can pick the proper translations.

Posted by Simos at Tue Mar 28 23:13:41 2006

Jan, I know about both importing and exporting, been there, done that :)

I'd like to have more collaboration between ubunti translation team and GNOME/KDE translation teams. And ther is none.

Posted by Alexandre Prokoudine at Wed Mar 29 11:54:29 2006

The comments about Rosetta and coordination with upstream are completely off topic for this post, which is about how Rosetta and/or language packs could be used to coordinate OLPC translation. It's a shame people in their comments have lost sight of that and seen it as an opportunity to make criticisms which frankly belong elsewhere.

To answer the off-topic criticisms: this can in part be assuaged by controlling the Ubuntu translator group: ensure that everyone in the team follows GNOME (or TP) translation guidelines, and that they are regularly giving back upstream. If you leave your translation team open, of course bad translations are going to creep in!

Back on topic: Rosetta + langpacks (particularly the latter) would be an excellent solution to the problems Jim describes, as far as I can see. I don't see any disadvantage: Rosetta even handles offline translating nicely, you just download the po, translate it offline, and upload it again when you are online. In that sense it can act as a repository while at the same time providing an online translation interface.

Matt

Posted by Matthew East at Thu Mar 30 15:21:54 2006