How to get "the tab sequence" of a unicode as we enter in the REPL in Julia code?

liuyxpp · August 30, 2020, 1:09am

I want to write a function (or if it already exists)

function unicodesymbol2string(us::Symbol)
    # do something to find s
    return s
end

which behaves like

julia> unicodesymbol2string(:β)
beta
julia> unicodesymbol2string(:Σ)
Sigma
julia> unicodesymbol2string(:÷)
div

The returned string should correspond to the tab sequence listed at https://docs.julialang.org/en/v1/manual/unicode-input/

stevengj · August 30, 2020, 1:34am

Just paste the symbol at the help> prompt (type ? in the REPL) and it will tell you the tab completion.

liuyxpp · August 30, 2020, 1:40am

I want to know dynamically in my Julia code

Use case: my software will accept a unicode symbol input by user, and I need to convert it to plain ASCII to be used by filename, dirname etc.

jling · August 30, 2020, 2:26am

many filesystems support unicode names for file/directories.

you can do a reverse search from here:
https://github.com/JuliaLang/julia/blob/0336f672db739c784e2ebfc4d6c3dab8ba713611/stdlib/REPL/src/latex_symbols.jl#L95

liuyxpp · August 30, 2020, 3:24am

Thank you for pointing to the right direction. I found the exact function what I am looking for:

julia> using REPL
julia> REPL.symbol_latex("β")
"\\beta"

With this funciton, my function becomes

function unicodesymbol2string(us::Symbol)
    return REPL.symbol_latex(String(us))[2:end]
end

stevengj · August 31, 2020, 7:37pm

(Which common filesystems these days, not including FAT, don’t support Unicode filenames?)

Palli · August 31, 2020, 8:43pm

Julia has better support for math symbols than I believe any other language, also supporting a lot of emojis e.g. Beer Mug and \:baby_bottle: (do not mix together).

That said, what’s the use case, as I would be a bit annoyed seeing e.g. my name Páll shown as as Pll, dropping a letter, (Palli is a nickname; I use it also to help foreigners, easier to say), or Icelandic Þ and Ð as \\TH and \\DH, so “working” partially for some (and fully for only a few, except for ASCII-only) names, like Þórður, to \\THr\\dhur (missing out on the ó)?

liuyxpp · September 1, 2020, 1:38am

I’ve never encountered filename with unicode from my work experience. My colleagues never use it. It may cause many problems, who knows? For example, they don’t know how to type the filename with unicode? Using plain ASCII can avoid such headaches.

YongHee-Kim · September 1, 2020, 5:43am

It is not uncommon to use a Unicode filename in Korea. (programmers naturally try to avoid it. But sometimes it happens.)

And It causes headaches when the OS or third party app uses the Unicode name.

For example, we uses Google drive file stream in our work environment. And the default path for Google file stream drive is…

Unicode and whitespaces

Palli · September 1, 2020, 10:06am

I think you mean, you never use fullwidth Unicode (e.g. Chinese characters) vs halfwidth (e.g. ASCII but not only), understandable, still interesting.

Unicode, i.e. UTF-8, is ubiquitous on Linux (and Unicode, there UTF-16, on Windows) on filesystems. I’ve probably used UTF-8 only, for well over a decade, and filenames in Icelandic or e.g. German work fine. I often use ASCII subset for work reasons (and personally often out of habit), since I work for a British (actually Hong Kong) company, and English is the default language. I use Julia at work, and I guess I could use Unicode (math symbols) in the Julia code, so far haven’t, and it should work in filesystems, while I’m more afraid then of Linux and Windows servers cooperating well.

Tamas_Papp · September 1, 2020, 11:09am

Should work fine on any recent (< 10 yo) OS. If not, the problem can easily be remedied.

They can copy-paste or open the file using an interactive file dialog.

There are exceptions (ancient mainframes etc), but generally a custom ad-hoc workaround (like the transcription proposed above) is likely to take more effort and be less robust than setting up a filesystem that can handle Unicode.

liuyxpp · September 1, 2020, 1:07pm

We have many legacy C++ library and I actually don’t know if unicode filename works out of box using C++ iostream. No one of us probably want to update those C++ code. Before we transfer all those C++ codes to Julia, we’d better to adhere to filenames with pure ASCII characters. That is another reason I don’t like to save a data file with name such as “ϕ0.1_α0.5.dat”.

stevengj · September 1, 2020, 1:14pm

It will on Mac and Linux, where filenames are all UTF-8 encoded — all of the C/C++ libraries treat filenames as an opaque collection of bytes and don’t care about the encoding, and so they handle arbitrary Unicode filenames automatically.

On Windows, unfortunately, you need to use special UTF-16 or wchar_t filesystem APIs (wiostream) to access Unicode filenames from C/C++. (The Win32 API took a wrong turn early in the development of Unicode and never recovered. However, there is hope that this will change soon and UTF-8 usage will become widespread on Windows — it’s apparently become possible to use the ordinary C/C++ APIs with UTF-8 encoded filenames, and this may yet become the default.)

anon37204545 · September 1, 2020, 1:49pm

I use it very often since Serbian language has letters š, đ, ž, č, ć which aren’t ASCII. Naming my files and folders in such way is way more readable and intuitive. I still haven’t encountered any problems with that, but maybe it’s because I always name my software development files in English.

Palli · September 1, 2020, 2:40pm

People even use UTF-8 on (recent) mainframes (UTF-EBCDIC never got popular), while probably not for most programs, and not sure about for actual filenames: IBM Documentation

and on OS/2 (I see UTF-8 and UCS-2 at Alex Taylor: OS/2 Universal Language Support maybe only for REXX).

thorek1 · August 18, 2023, 12:44pm

REPL.symbol_latex("ϵ") works but REPL.symbol_latex("ϵ̂") doesn’t.

Any hint as to how to deal with the latter case?

Is it going to get any better with Julia 1.10 and a parser written in Julia.

My use-case is translating equations to Matlab/dynare (no unicode).

stevengj · August 18, 2023, 1:02pm

ϵ̂ doesn’t have a single tab completion (it’s \epsilon<tab>\hat<tab>) so symbol_latex won’t work.
You have to break it up into characters similar to this logic.

thorek1 · August 18, 2023, 1:48pm

so in my case this would be a solution:

function translate_to_ascii(x::Symbol)
    s = Unicode.normalize(string(x), :NFD)
    latex = [REPL.symbol_latex(string(i))[2:end] for i in s]
    join(latex,"_")
end

translate_to_ascii(:α̂ₗ)
# "alpha_hat__l"

thanks for the prompt reply

stevengj · August 18, 2023, 1:52pm

Not quite, because REPL.symbol_latex returns "" for things that don’t have tab completions (e.g. ASCII characters).

thorek1 · August 18, 2023, 2:41pm

something like this then:

function translate_to_ascii(x::Symbol)
    s = Unicode.normalize(string(x), :NFD)

    latex = String[]

    for i in ss
        out = REPL.symbol_latex(string(i))[2:end]
        if out == ""
            out = string(i)
        end
        push!(latex,out)
    end

    join(latex,"_")
end

translate_to_ascii(:l1α̂ₗ)

Topic		Replies	Views
Tab completion of more than one Unicode character at a time General Usage repl , vscode , unicode	10	1441	December 25, 2021
Tab completion of \uXXXX in the REPL? Internals & Design unicode	23	4386	January 12, 2024
Character sequence for a unicode character General Usage documentation	7	812	November 2, 2020
Tab completion and Unicode names for values not in Julia Documentation General Usage unicode	1	255	June 10, 2023
UnicodeREPL.jl - Type any Unicode character in the REPL Package Announcements repl , unicode , codepoint	6	937	July 4, 2024

How to get "the tab sequence" of a unicode as we enter in the REPL in Julia code?

Related topics