Why so complex representation of a char?


#1

I think the representation of a char in a terminal is overly complex in Julia. For instance,

julia> 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Is there any reason underlying the complex representation of characters?


#2

I guess that’s mainly because Chars representing unicode are highly ambiguous in their graphical representation, so people doing more involved work with Chars will appreciate this.
It should also help the non expert user to find tricky unicode bugs.
The latter is a pretty good reason to make it the default, since quite a lot of people may not even know, that they could get char1 != char2, even if they look (almost) exactly the same.
Maybe one could print the visible chars in the range 0:255 with less information as a compromise?


#3

Why? Realize that the verbose format is only used for “multiline” text/plain display. In other contexts it uses terse output:

julia> ['a', 'b']
2-element Array{Char,1}:
 'a'
 'b'

julia> println('a')
a

julia> repr('a')
"'a'"

julia> show('a')
'a'

What is the downside of showing more information about the character when you display a single character in the REPL or similar contexts?


#4

Question: Are these the same letters or different ones: A, Α, А?

Answer: They are all different.

In the Julia REPL it’s clear:

julia> 'A'
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> 'Α'
'Α': Unicode U+0391 (category Lu: Letter, uppercase)

julia> 'А'
'А': Unicode U+0410 (category Lu: Letter, uppercase)

If you don’t care about the extra information, you can just ignore it.


#5

To what end? What harm is the additional information doing?


#6

@stevengj @StefanKarpinski, why do you ask me, I don’t really mind :smiley:

Just wanted to be productive and search for a simple solution, that may make @peters happy, as well :slight_smile:


#7

If everyone in the world got to change one thing about the language it would be a total mess.


#8

If you are really annoyed for some reason, you can always do:

julia> import Base.show

julia> show(io::IO, mime::MIME"text/plain", a::Char) = print(io, a)
show (generic function with 296 methods)

julia> 'a'
a

#9

If it is problem for you then you could change this behavior in your .julia/config/startup.jl initialization file.

This is what you could use:

julia> import Base.show

julia> Base.show(io::IO, ::MIME{Symbol("text/plain")}, c::AbstractChar) = show(io, c)

julia> 'a'
'a'

#10

LOL :smiley:


#11

You won! :smiley:


#12

Isn’t there some showcompact or something? I think it used to be showcompact but I think it changed and I don’t remember how to do it now.

By the way, I really like the default printing behavior, explicit is good :+1:.


#13

Why 0391 and 0410 is different? just curious.


#14

It seems it would be useful to enhance info! :smiley:
0391 is GREEK CAPITAL LETTER ALPHA
0410 is CYRILLIC CAPITAL LETTER A


#15

lol…that’s funny


#16

This is funny too: latin ɛ and greek ε are different different but same! ( ping @Tero_Frondelius :slight_smile: )

julia> Meta.parse("\u025B") === Meta.parse("\u03B5")
true

#17

#18

Or Julia identifiers are NFC-normalized (source: https://docs.julialang.org/en/v1/manual/variables/#Allowed-Variable-Names-1)


#19

Yes, it would have been nice to include the official name of each Unicode character, but that would involve shipping about 1M of data from the Unicode character database.


#20

Thanks @sdanisch for trying me happy :slight_smile: