Why so complex representation of a char?

I think the representation of a char in a terminal is overly complex in Julia. For instance,

julia> 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Is there any reason underlying the complex representation of characters?

I guess thatā€™s mainly because Chars representing unicode are highly ambiguous in their graphical representation, so people doing more involved work with Chars will appreciate this.
It should also help the non expert user to find tricky unicode bugs.
The latter is a pretty good reason to make it the default, since quite a lot of people may not even know, that they could get char1 != char2, even if they look (almost) exactly the same.
Maybe one could print the visible chars in the range 0:255 with less information as a compromise?

Why? Realize that the verbose format is only used for ā€œmultilineā€ text/plain display. In other contexts it uses terse output:

julia> ['a', 'b']
2-element Array{Char,1}:
 'a'
 'b'

julia> println('a')
a

julia> repr('a')
"'a'"

julia> show('a')
'a'

What is the downside of showing more information about the character when you display a single character in the REPL or similar contexts?

7 Likes

Question: Are these the same letters or different ones: A, Ī‘, Š?

Answer: They are all different.

In the Julia REPL itā€™s clear:

julia> 'A'
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> 'Ī‘'
'Ī‘': Unicode U+0391 (category Lu: Letter, uppercase)

julia> 'Š'
'Š': Unicode U+0410 (category Lu: Letter, uppercase)

If you donā€™t care about the extra information, you can just ignore it.

15 Likes

To what end? What harm is the additional information doing?

@stevengj @StefanKarpinski, why do you ask me, I donā€™t really mind :smiley:

Just wanted to be productive and search for a simple solution, that may make @peters happy, as well :slight_smile:

2 Likes

If everyone in the world got to change one thing about the language it would be a total mess.

12 Likes

If you are really annoyed for some reason, you can always do:

julia> import Base.show

julia> show(io::IO, mime::MIME"text/plain", a::Char) = print(io, a)
show (generic function with 296 methods)

julia> 'a'
a

If it is problem for you then you could change this behavior in your .julia/config/startup.jl initialization file.

This is what you could use:

julia> import Base.show

julia> Base.show(io::IO, ::MIME{Symbol("text/plain")}, c::AbstractChar) = show(io, c)

julia> 'a'
'a'

LOL :smiley:

3 Likes

You won! :smiley:

3 Likes

Isnā€™t there some showcompact or something? I think it used to be showcompact but I think it changed and I donā€™t remember how to do it now.

By the way, I really like the default printing behavior, explicit is good :+1:.

1 Like

Why 0391 and 0410 is different? just curious.

It seems it would be useful to enhance info! :smiley:
0391 is GREEK CAPITAL LETTER ALPHA
0410 is CYRILLIC CAPITAL LETTER A

2 Likes

lolā€¦thatā€™s funny

This is funny too: latin ɛ and greek Īµ are different different but same! ( ping @Tero_Frondelius :slight_smile: )

julia> Meta.parse("\u025B") === Meta.parse("\u03B5")
true
2 Likes

https://github.com/JuliaLang/julia/pull/19464

2 Likes

Or Julia identifiers are NFC-normalized (source: Variables Ā· The Julia Language)

1 Like

Yes, it would have been nice to include the official name of each Unicode character, but that would involve shipping about 1M of data from the Unicode character database.

2 Likes

Thanks @sdanisch for trying me happy :slight_smile: