Skip to content

Making Ponies Fly

Separate from the tremendous amount of feedback from the community I quickly want to outline the pydotorg setup for later reuse.

We have the most recent versions of Sphinx (with Native Language Support), Pootle and Translate Toolkit running from checkout on our servers and a current checkout of the Sphinx project source in question. There is a directory for .pot files (message templates) built by Sphinx and one for .mo files (compiled catalogs) built from Pootle’s .po files (message catalogs). One last directory is used for translated HTML builds.

Continue reading ›

The Way We Roll

Without further ado I would like to announce the beta launch of the Python translation services, available at

I am reprinting the full announcement made to the Python Documentation Special Interest Group here for posterity:

Dear Python Documentation community,

we are proud to announce the *BETA* launch of the translation services for the official Python documentation as part of Google’s Summer of Code. It is available from

and is open for registration now. We have added a few languages that we felt would generate enough feedback but are always happy to add more language teams.

Please note that the software toolchain used to create this service is still experimental and might probably change substantially. We are trying our best to maintain stable services but cannot guarantee every bit you submit will be ultimately usable in our final translation targets.

If you are experiencing any trouble, want to bring up any suggestions or have other feedback do not hesitate to contact me or file a ticket at

Robert Lehmann

Into the Wheel Shop

To rehash quickly, the Sphinx Natural Language Support project actually spans two very different aspects:

  • a Sphinx extension to extract/incorporate translatable strings
  • an interface to maintain translations

It turns out the latter half is already partially solved by Pootle and I can build on that instead of rolling my own half-baked ad-hoc web interface. Continue reading ›

Since I’d Gone This Far, I Might As Well Turn Around

Extracting messages from Sphinx is fairly easy. Apart from the occasional obstacle here and there when dealing with non-plain text such as directives the machinery already in place makes it a straight-forward task. But collecting messages from documents is only half the battle in implementing Native Language Support for Sphinx — they also need to go in again. Sphinx already exposes mechanisms to configure language settings, commonly called locale. Hidden deep down in its innards there is a procedure to load gettext-style message catalogs (sphinx/locale/ which I only needed to augment for domains other than Sphinx itself. Continue reading ›

Drumming Up

During LinuxTag 2010 in Berlin I discussed internationalization issues with a bunch of people from major Linux distributions. I hereby express my gratitude to all of you and will summarize my impressions. Any errors are very likely to be me mixing up the facts and I would be pleased to learn better from you! Continue reading ›

Fine or Coarse?

I touched on message granularity in my proposal already and nailed down a pragmatic policy in my prototype: messages are basically split on a per-paragraph level. Inline markup is explicitly atomic and never propagated to a sole message.

Continue reading ›

It’s alive, it’s moving, IT’S ALIVE!

I have pushed an early prototype of a PO builder — boldly called MessageCatalogBuilder — to Sphinx. I previously announced this to be a Sphinx extension but changed my mind and incorporated it into Sphinx’ core because patching translation sets into doctrees is likely a tightly integrated task. It extends the build mechanism by a new gettext target and extracts raw messages into a collection of .pot files.

“Raw messages?,” you say. That’s basically synonymous with woo, a lot of output… which is entirely useless. These messages contain no markup at all and thus are handy to estimate a message catalog’s contents and size but have no practical application for documents — except if you want to lose all inline markup, that is. Continue reading ›

Casting South

I have set sails for the Community Bonding Period and am veering away from the Sphinx codebase to more research-related realms.

I abandoned the XLIFF format ­— as there really is no point in duplicate representation — and will focus my efforts on PO files. I am going to dive into gettext family of translation toolsuites to get an impression not only of the PO File Format Specification but of the whole normative landscape of solutions. Continue reading ›

Meet Your Mentors

This Saturday morning I had a meeting with my three(!) mentors, namely Jannis Leidel, Martin von Löwis and Georg Brandl on IRC (with Daniel Neuhäuser and Armin Ronacher chiming in occasionally).

I have been asked in advance why I chose gettext .PO files to store translations and we briefly discussed that issue. gettext was written for translating messages and not neccessarily whole paragraphs of free text. Its key tool used for updating translation sets, msgmerge, uses fuzzy matching with the intention of producing better results but this tends to fail. There is no notion of versioning and VCSes (or patch queues) need to be integrated into the workflow to retain history and inline markup is still a whole different can of worms (which I need to meditate about). The two selling points for .PO files are its suitability as a key-value store and that there are established tools after all. The other shortcomings have to be overcome by the new tool which will monitor version control and display stale documentation segments (and exports message catalogs along the way). Continue reading ›

Warming up…

So for starters I wrote a sidebar extension inspired by Python Sidebar (of Edgewall credit).

I initially planned to file a simple patch but it grew into a full-fledged extension so the commit history is pretty meaningless now. Behind the scenes it still took about five iterations to get this Done Right. Continue reading ›


Congratulations! Your proposal “Sphinx Native Language Support: Toolchain for Creating, Tracking, and Viewing Internationalized Versions of Sphinx Documents” as submitted to “Python Software Foundation” has been accepted for Google Summer of Code 2010.

Proposal submitted

After a lot of discussion (thanks to Georg Brandl, Martin von Löwis and Frederik Braun for providing valuable feedback to my proposal) I have sent in my application for Google’s Summer of Code 2010: Sphinx Native Language Support: Toolchain for Creating, Tracking, and Viewing Internationalized Versions of Sphinx Documents. Check it out!