State of translation for the Julia site and manual


#1

Hello all!

As v1.0 rolls around, I wanted to ask what the preferred tool chain for translating Julia materials into foreign languages looks like. Yes the docs are still a bit in flux, but some parts of the manual have settled down enough to consider starting a conversation sooner rather than later.

I understand if this has not been a priority at all, but I think that shipping with multilingual manuals is a definite plus for adoption worldwide, even if English is the lingua franca of programming.

Things that already exist:

I believe this is how the julialang.org site has a banner at the top right to translate to English/French.

… I say, ¡No, señor! Julia deserves a proper toolchain for translation from Base down to the package ecosystem and I am all ears about how to make it happen.

Bear in mind that this means that any given language has to translate, by some handy line counting with wc -l:

This means translating:

  • 3105 strings for the julialang on Transifex site
  • 21k lines from the .md files in the manual
  • 4.7k from the DevDocs
  • 1.8k from the stdlib
  • 5.5k Docstrings for functions

So about 35k lines in total, per language - whilst keeping track of updates and modifications.


Efforts already underway:


#2

What are your thoughts about (and is it at all possible) using something like Goggle translate to automatically translate everything to all the available languages, flag it as auto-translated, and allow the humans to correct it as they move along, again, flagging it as translated?
The advantage would be two-fold: you’ll have everything somewhat readable in all the languages really fast, it should be easier for the translators to correct Google’s effort than to start from fresh.
Just a thought.


#3

To do this we would need these tools integrated with Documenter.jl. If that is does, then there’d also be the nice side effect that package documentation could be translated as well. However, the issue is that the infrastructure doesn’t exist yet.


#4

This is Korean translation tools and here is the result


#5

Most mature translation platforms (Crowdin, Translatewiki, Transifex, etc.) already support machine translation as a way to pre-fill translations, along with additional helpers (glossaries to ensure a consistent nomenclature, per-string discussions, language fallback chains, translation memory to reuse existing translations of similar content, contextual metadata to resolve ambiguities, etc.) so here’s no need to reinvent the wheel – just integrate one of these platforms with the documentation workflow. But of course, I’m not claiming that’s an easy task…


#6

What is the roadmap for adoption?


#7

In addition about the tool used for Korean translation,
juliakorea/translate-doc has included a script to find the recent changes of the documents in Julia repository.

If you have any question about the code,
please open an issue in the translate-doc.

thanks!


#8

There was also this effort to translate the docs:

In the beginning the docs used Python’s Sphinx:

image

I think that in order to <humour>take i18n seriously</humour>, we first need to develop a pure julia implementation of -Gettext. Something that can be included in the standard library (not in a package, ie julia’s own strings should be i18n-able too, not just packages), like in python (anything else could be in a I18n package).

This would re-enable the workflow depicted in the image above, and would be the base for everything else, including support in documenter and from there i18n for everyone! :slight_smile:

Basically gettext provides a function that is used to wrap strings intended for translation, thus marking them as translatable, wich in turn allows the compilation of catalogues of this strings in well established formats used by any other i18n software, etc.

Here is a good explanation of how this looks like in real code (python), but I’m sure you can envision how it could be adapted in julia:

The main difference is that in Julia gettext AKA _, would be a non standard string literal macro, ie: _"Translate me!" instead of a function as in any other language I’ve seen it, ie:

from django.utils.translation import gettext as _ # 'universal' convention
_("Translate me!")

– IMHO


#9

There’s also another related Documenter issue.