Extracting messages from Sphinx is fairly easy. Apart from the occasional obstacle here and there when dealing with non-plain text such as directives the machinery already in place makes it a straight-forward task. But collecting messages from documents is only half the battle in implementing Native Language Support for Sphinx — they also need to go in again. Sphinx already exposes mechanisms to configure language settings, commonly called locale. Hidden deep down in its innards there is a procedure to load gettext-style message catalogs (
sphinx/locale/__init__.py) which I only needed to augment for domains other than Sphinx itself.
With that done we have the translated message texts for each document node at our fingertips. These are still flat strings rather than nested document trees and thus not directly usable for our purposes.
As luck would have it Docutils readily exposes the reStructuredText parser and we can just turn the messages into small doctrees and merge those into our existing document. This is now implemented in Sphinx.
There is one last remaining caveat: the translation step as it stands turns the build process into a dog slow mess. Sphinx decided to update the whole environment (that is, re-read all files) when the
language value changes; which is the case every. single. build. right now. For our current use I don’t care too hard though this needs to be fixed when sphinx-i18n is merged into upstream. It’s likely to require more in-depth architectural changes to Sphinx’ core to get this as quick and resource-friendly as possible.