DocstringTranslation.jl: Translate docstrings in Julia using your preferred language

TL;DR

I’m happy to annouce the release of following packages. Have a try!

Introduction (You can skip)

The Julia Programming Language solves the so-called “two-language problem.” 2 - 1 = 1, which is fantastic news for those in the scientific domain.

What about natural languages? I believe that there are still (N>1)-language problem.

I mean, non-native English speaker people have to learn natural language so called English.

Let me tell you what the issue is on this one.

I’m Japanese. I’m not a native English speaker, but I write my thoughts in English. Why? Many people in the world do not nor cannot read my native language(=Japanese 日本語) directly(i.e., without translation). People here, Julia Discourse, use English, so I use English. Yes, I know, of course, that English is an official language in many countries and territories. This is a natural consequence.

When it comes to writing something in English, Japanese people suffer from translating because the word order between Japanese and English is different. For instance, consider the following example:

# This is Japanese
私はホテルの向かいにあるお店で見たスーツを着てみたいです.

Can you translate it into English IMMEDIATELY without machine translation? You can see the answer here:

Did you see the answer? O.K. One more time. May I ask a question?

Can you translate it into English IMMEDIATELY without machine translation again?

In the context of the Julia language, Japanese Julia programmers execute source code in their brains as below in order to communicate with you.

en_str = process(reverse(collect("WhatIwantToSayIterator{Japanese}()")), dict=JP2EN)

If you are a good Julia programmer, this implementation looks slow. That’s true. How much time do I spend writing my messy English to explain my thoughts?

The process function is complicated. Let me show my messy implementation in Julia in my brain.

# Vocabulary list
# Well, I'll do my best.
const JP2EN = Dict(
    "私" => "I",
    "スーツ" => "suit",
       "お店" => "store",
       ...
)

abstract type Tense end
abstract type JapanesePastTense <: Tense end

"""
    _dispatch(tense, ctx)

Determies Japanese "... した" in ctx is a past tense or perfect tense.
In Japanese, there is no distinction between them.
"""
function _dispatch(tense, ctx)
   # Actually I don't know how to implement.
   # I use DeepL/Write or Grammarly. Don't say to me I'm cheating.
end

function dispatchtense(tense::Tense, ctx::Context)
    tense isa JapanesePastTense 
        if _dispatch(tense, ctx) isa PastTense
            return convert(PastTense, tense)
        else
            return convert(PerfectTense, tense)
        end
    else
        PresentTense(tense)
    end
end

function processverb(;v, tense::PerfectTense)
    return "have " * v * "ed"
end

function processverb(;v, tense::PerfectTense)
    return "" * v * "ed"
end

"""
    process(w::Word, ctx::Context)

do something complicated which is a nightmare for Japanese
"""
function process(w::Word, ctx::Context)
  if w isa Verb
        tense = determinetense(ctx)
        if tense isa PresentTense
            # "use" becomes "uses"
            # but, "study" also "studys" oh no... should be "studies"
            w *= "s" 
            return w
        else
            processverb(;v=w, tense)
        end
    else 
        # This implementation is slow and inaccurate.
        if isdefinite(w)
            w = "a " * w
        elseif isdefinite(w)
            w = "the " * w
        end
    end
end

function process(sentence::String)
    context = Context()
    for w in sentence
         process(w::Word, context)
    end
end

en_str = process(
    reverse(collect("<What I want to say>")), dict=JP2EN
)

Geez… The joke is over. Back to more and talk about more serious topics.

In short, as for me at least, reading, writing and speaking English is NON-trivial task. Can we relax the matter in technology?

Introduction (It would be nice to read)

Have you ever wanted to understand docstrings in Julia API in your preferred language? Your dream can be solved by using the Machine Learning API, which you can use right away if you have an OpenAI API key. Try it out my Julia package named DocstringTranslation.jl.

Demo

Related packages

If you are not willing to pass something to OpenAI’s API, you can use local LLM. Ollama will help you.

Using the Google translation API is another solution.

(Warning, this package is unstable.)

If you want to make OpenAI generate docstring or explain something from a user given function, try DocstringChef.jl

I hope my packages reduce N-language problems. Let’s read docstrings in preferred language.

Feedback is welcome. Leave your comment in your preferred language :D.

13 Likes

This is super exciting @terasakisatoshi ! Do you know if this could somehow be set-up as a Documenter.jl GitHub Action to help with creating multilingual documentation website pages? Or plug into Documenter.jl somehow?

Hello, @TheCedarPrince . I really appreciate your feedback. I’m not an expert on Documenter.jl but I could generate documentation using Documenter.jl that contains docstrings translated into Japanese:

# docs/make.jl
using DocstringTranslationOllamaBackend
@switchlang! :Japanese

using Documenter, Example4DocstringTranslation

makedocs(modules = [Example4DocstringTranslation],
         sitename = "Example4DocstringTranslation.jl",
         format = Documenter.HTML()
         )

I think you expect

“Example4DocstringTranslation Julia package repo.”

is also translated into Japanese. However my packages DocstringTranslation[BlahBlahBackend].jl does not reach this sentence because they only see docstrings, not markdown.

1 Like

What about this one? I could translate into Japanese.

using Documenter, Example4DocstringTranslation

# Begin hook
using DocstringTranslationOllamaBackend
using DocstringTranslationOllamaBackend: translate_with_ollama, default_lang, default_model
@switchlang! :Japanese

using Documenter: Markdown

function promptfn(
    m::Union{Markdown.MD, AbstractString},
    language::String = default_lang(),
)
    prompt = """
You are an expert in the Julia programming language.
Please provide a faithful translation of the following Documenter.jl flavor Markdown in $(language).

\"\"\"
$(m)
\"\"\"

Keep the following points in mind:

- The translation should retain the formatting of the original Markdown. 
- Do not translate quoted words. 
- Do not translate headings, where headings means text beginning with #.
- Do not add or remove unnecessary text. 
- Return only a faithful translation.
- Do not stop until the translation is complete.

Please start. Only return the result.
"""
    return prompt
end

# Overrides Page constructor to hack Documenter to translate docstrings
function Documenter.Page(source::AbstractString, build::AbstractString, workdir::AbstractString)
    # The Markdown standard library parser is sensitive to line endings:
    #   https://github.com/JuliaLang/julia/issues/29344
    # This can lead to different AST and therefore differently rendered docs, depending on
    # what platform the docs are being built (e.g. when Git checks out LF files with
    # CRFL line endings on Windows). To make sure that the docs are always built consistently,
    # we'll normalize the line endings when parsing Markdown files by removing all CR characters.
    mdsrc = replace(read(source, String), '\r' => "")
    mdpage = Markdown.parse(mdsrc)

    # begin DocstringTranslationOllamaBackend
    mdpage = translate_with_ollama(mdpage, default_lang(), default_model(), promptfn)
    @info "mdpage" mdpage
    # end DocstringTranslationOllamaBackend
    mdast = try
        convert(Documenter.MarkdownAST.Node, mdpage)
    catch err
        @error """
        MarkdownAST conversion error on $(source).
        This is a bug — please report this on the Documenter issue tracker
        """
        rethrow(err)
    end
    return Documenter.Page(source, build, workdir, mdpage.content, Documenter.Globals(), mdast)
end
# end hook

makedocs(modules = [Example4DocstringTranslation],
         sitename = "Example4DocstringTranslation.jl",
         format = Documenter.HTML()
         )

deploydocs(
    repo = "github.com/AtelierArith/Example4DocstringTranslation.jl.git",
    target = "build",
    deps   = nothing,
    make   = nothing,
    push_preview = true,
)

2 Likes

Are you planning to register these packages any time soon?

Actually, I’m not sure. The reason is that

If you need my package, add the following content in your Project.toml


[deps]
DocstringTranslationOllamaBackend = "0a42a1af-e0c0-4cdb-822d-6b43e0751063"

[sources]
DocstringTranslationOllamaBackend = {url = "https://github.com/AtelierArith/DocstringTranslationOllamaBackend.jl"}
1 Like

Oh that is fascinating! This was something I was thinking about but even your prior message with translated docstrings is awesome.

Thanks for sharing – I wonder if this could be used in some Julia specific plugin to Documenter to better help with i18n (i.e. internationalization and localization)? I am still a bit conservative on LLM usage, but for making major/minor tagged releases, this could be a nice GitHub action to run to make certain languages available quickly.

Wouldn’t be perfect, but might be an interesting start. :thinking:

2 Likes