Localization package - I18n - Resource Bundles

package
proposal

#1

I want to propose the development of a new package, which covers the I18n topics.
Recently I made a working prototype, which treats locales, resource bundles (these are the Java terms), and string translations like gettext. https://github.com/KlausC/ResourceBundles.jl.

It has a slightly different approach than Gettext, https://github.com/Julia-i18n/Gettext.jl and does not depend on python implementation.

Main use of resource bundles is the translation of natural language texts which are used in the source code. The programmer writes the text as a string in the language of the software maintainer. These texts are used as keys of an external database, which stores translations in various natural languages. The selected variant is determined by the Locale, which is set at program start up or from the environment.
The text strings in the program support interpolation in the Julian style. The translated strings may permute the positions of the interpolated spots. Additionally, multiple plural forms are supported in the same style.


#2

Interesting! I’m curious what kind of format you plan to support for translations? Gettext’s PO format?

May I suggest finding a more explicit name? I think it should contain “internationalization”, “localization”/“locale” or “translation”.


#3

You are speaking about the format of the database files containing the translations. I looked at the gettext-PO format and it would have the advantage of the existing GNU tools. Nevertheless I preferred the simpler approach, which is now implemented. I could imagine to provide a reader to transform the PO input into the internal form.
The current implementation expects the data in the form of a Julia expression of type Vector{Pair{String,Union{String,Vector{String}}}} .

The singular form is "original text with interpolation indicators" => "translated text with permuted interpolation indicators".
The plural form "original text with one interpolation indicator" => [ "first plural form for amount $(1)", "second form for amount $(2)", "third form for amount $(Any)" ] allows for multiple plural forms of the translation depending on a numeric indicator. (example in English: for 0, 1, and more than 1.)
Please have a look on README for examples.

Actually the current name is a working name for a bundle of features, which should maybe reside in separate related packages.

  • ( name Locales ???) treats the definition of type Locale, with some extras compared to String.
  • (name I18nResources ???) treats the retrieval of information, depending on a given locale and a key. This information is not restricted to textual information, but may be of any (user defined) type. The look-up mechanism includes a fall-back strategy to less specific locales (e.g. if no text for Locale("en-US") is available take text for Locale("en")).
  • (name I18nTranslations ???) handles translations of text, which are embedded into Julia source files. These text are easily identified by a string-macro (currently called str_tr, e.g. tr"programmers message").

I appreciate any kind of constructive critics, encouragement, guidance, and interest by the community.


#4

Yes, the (only?) advantage of supporting Gettext PO files as input is that they are supported by a lot of GUIs, which are essential for translators, who are not always developers and who may not be willing to learn a new workflow or syntax. Platforms like Transifex also allow doing this online.

About splitting the package into three pieces, I’m not sure it’s really needed. Maybe locales could be defined in their separate package if it turns out other packages need them, but I don’t see the point of separating resources from translations.


#5

There are quite some design decisions still to be done.

  • Locales: a concept of a “current locale” has to be supported, because it is uncomfortable or impossible to provide the locale as an explicit argument (in most use cases).
    Question: Is it necessary to keep it in a task-specific variable or is a global variable for the whole program sufficient? Currently, default locales are stored task-specifically.
    Question: Do we support one current locale for all purposes, or a specific one for different usages ( one for messages, one for data formats etc., as Posix environment variables suggest)? Currently, each of the Posix-categories has its own default locale.

  • ResourceBundles:
    Question: Is it a good or bad idea to bind the resources to packages/modules? Currently a module-global variable exists for each module, which provides resource files at a package- and module-specific location.

  • Translations:
    Question: Do we require more API besides the access via a string macro? Currently only @tr_str exists. The gettext-features of support of plural forms and application of context names are fully supported.
    Question: Do we want to support printf-format strings? Permutation of interpolation like in C-Boost? Currently, only the Julia string-interpolation is implemented.

The following subjects are currently not supported:

  • Number formats / Date-Time formats
    Question: They require amendments in Base source (printf, show- methods) how to support this?
    Question: How should the rudimentary support hook in base/i18n.jl be used/modified/handled?

  • Sorting of strings using collation sequences. (LC_COLLATE)
    Proposal: do not apply to standard strings, but to a new string type NLSString.

  • Character classes (LC_CTYPE)
    Question: Are there locale specific variants beyond the Unicode classes to be supported?


#6

I see: The PO-format reader should be included! Not too hard.


#7

I18n is actually a really standard abbreviation for this. The word Internationalization would be ok but it’s a fingerf to type!


#8

The gettext-PO reader has been implemented. That inspired me to change the treatment of plural forms to become in line with gettext. Neverthless, the gettext-API has not been adopted; the tr-macro approach looks much more appealing to me.

I did not rename the project yet. According to Stefan’s hint, I am tending to rename it I18n.