Why so complex representation of a char?

peters · November 1, 2018, 1:08pm

I think the representation of a char in a terminal is overly complex in Julia. For instance,

julia> 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Is there any reason underlying the complex representation of characters?

sdanisch · November 1, 2018, 2:00pm

I guess that’s mainly because Chars representing unicode are highly ambiguous in their graphical representation, so people doing more involved work with Chars will appreciate this.
It should also help the non expert user to find tricky unicode bugs.
The latter is a pretty good reason to make it the default, since quite a lot of people may not even know, that they could get char1 != char2, even if they look (almost) exactly the same.
Maybe one could print the visible chars in the range 0:255 with less information as a compromise?

stevengj · November 1, 2018, 3:05pm

Why? Realize that the verbose format is only used for “multiline” text/plain display. In other contexts it uses terse output:

julia> ['a', 'b']
2-element Array{Char,1}:
 'a'
 'b'

julia> println('a')
a

julia> repr('a')
"'a'"

julia> show('a')
'a'

What is the downside of showing more information about the character when you display a single character in the REPL or similar contexts?

StefanKarpinski · November 1, 2018, 3:52pm

Question: Are these the same letters or different ones: A, Α, А?

Answer: They are all different.

In the Julia REPL it’s clear:

julia> 'A'
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> 'Α'
'Α': Unicode U+0391 (category Lu: Letter, uppercase)

julia> 'А'
'А': Unicode U+0410 (category Lu: Letter, uppercase)

If you don’t care about the extra information, you can just ignore it.

StefanKarpinski · November 1, 2018, 3:54pm

To what end? What harm is the additional information doing?

sdanisch · November 1, 2018, 4:05pm

@stevengj @StefanKarpinski, why do you ask me, I don’t really mind

Just wanted to be productive and search for a simple solution, that may make @peters happy, as well

StefanKarpinski · November 1, 2018, 4:13pm

If everyone in the world got to change one thing about the language it would be a total mess.

Ronis_BR · November 1, 2018, 5:12pm

If you are really annoyed for some reason, you can always do:

julia> import Base.show

julia> show(io::IO, mime::MIME"text/plain", a::Char) = print(io, a)
show (generic function with 296 methods)

julia> 'a'
a

Liso · November 1, 2018, 5:12pm

If it is problem for you then you could change this behavior in your .julia/config/startup.jl initialization file.

This is what you could use:

julia> import Base.show

julia> Base.show(io::IO, ::MIME{Symbol("text/plain")}, c::AbstractChar) = show(io, c)

julia> 'a'
'a'

Ronis_BR · November 1, 2018, 5:13pm

LOL

Liso · November 1, 2018, 5:14pm

You won!

ExpandingMan · November 1, 2018, 5:34pm

Isn’t there some showcompact or something? I think it used to be showcompact but I think it changed and I don’t remember how to do it now.

By the way, I really like the default printing behavior, explicit is good .

Roger-luo · November 1, 2018, 5:38pm

Why 0391 and 0410 is different? just curious.

Liso · November 1, 2018, 6:23pm

It seems it would be useful to enhance info!
0391 is GREEK CAPITAL LETTER ALPHA
0410 is CYRILLIC CAPITAL LETTER A

Roger-luo · November 1, 2018, 6:29pm

lol…that’s funny

Liso · November 1, 2018, 6:55pm

This is funny too: latin ɛ and greek ε are different different but same! ( ping @Tero_Frondelius )

julia> Meta.parse("\u025B") === Meta.parse("\u03B5")
true

kristoffer.carlsson · November 1, 2018, 6:56pm

https://github.com/JuliaLang/julia/pull/19464

Liso · November 1, 2018, 7:10pm

Or Julia identifiers are NFC-normalized (source: Variables · The Julia Language)

stevengj · November 1, 2018, 9:07pm

Yes, it would have been nice to include the official name of each Unicode character, but that would involve shipping about 1M of data from the Unicode character database.

peters · November 2, 2018, 10:09am

Thanks @sdanisch for trying me happy

Topic		Replies	Views
Steven Johnson's #19847 (more verbose multi-line display for Char) Internals & Design	1	658	January 4, 2017
Tab completion of \uXXXX in the REPL? Internals & Design unicode	23	4403	January 12, 2024
Stupid question on Unicode Offtopic	17	2381	September 21, 2019
Syntax: Escape hatch for unicode haters Internals & Design syntax , unicode	128	4494	January 16, 2024
Changes to the representation of Char Internals & Design	14	2853	December 12, 2017

Why so complex representation of a char?

Related topics