Why so complex representation of a char?

You are mentioning the following question, right? If so, I did not propose to lose any information about the UTF-8 details. Instead, what I proposed is just to drop some superfluous words e.g. “Category” from:
‘a’: ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

I think I can raise a question in the same way you asked. What is the upside of printing the superfluous word over and over, whenever users type characters directly in REPL? No need for any immediate change. Let’s see how many users support the idea of more concise representation for a character.

1 Like

Whether to leave out the word category or not seems like a trivial issue.

It really doesn’t matter whether we have that word or not, and I’m not sure it’s really worth debating either.

1 Like

Looking at this from the perspective of someone not already familiar with Unicode, if you don’t know how ASCII works and that Unicode has letter categories, how would you ever find out what those mysterious U+xxxx and Ll mean? It’s not at all clear to me that U+xxxx in your proposed representation is not a part of ASCII: ‘a’: ASCII U+0061 (Ll: Letter, lowercase). Sure it’s more information for someone familiar with Unicode, but I think the increased clarity is better when someone’s not familiar with the topic.

When characters are used in isolation, it is reasonable to assume that detailed information is desired. Note that when they are used in collections or to form strings, only the character is printed. Eg

julia> 'f'
'f': ASCII/Unicode U+0066 (category Ll: Letter, lowercase)

julia> collect("foo")
3-element Array{Char,1}:
 'f'
 'o'
 'o'

julia> "foo"
"foo"

I did not see any problem with complex representation before! :wink:

It is not just when user type character to REPL:

julia> a_or_b() = rand(Bool) ? 'a' : 'b';

julia> a_or_b()
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

If you are not familiar with Unicode why would you need info about category?

And if you need to know additional information about character why not use some special function(s)?

One could for example use @tkf’s nice hack:

julia> import PyCall

julia> unicodename(a::Char) = PyCall.pyimport("unicodedata")[:name](string(a));

Yeah, there’s always quite a bit of meta information in these outputs, so it’s not a big deal to me.

But it does seem like something that could be configurable. I’ve seen some requests to mimic Matlab’s format command when printing numbers:

>> format short
>> a
a =
    0.0853

>> format long
>> a
a =
   0.085307484729089

which seems neat. Having an option like that, also for Chars, could be quite handy.

1 Like

Yes nothing serious, but I was surprised at all the inputs I received on my small suggestion.

@Liso. Thanks for being with me. Yes, I like the Python approach.
Julia doesn’t have to bother its users (particularly beginners like me) with all the details. In most cases, users may not need all the details and when they really need the details, they can just call a function to spit out all the details. Explicit is better than implicit but also concise is better than verbose.

Explicit is better than implicit - I read that it is better to ask for some behavior (for example additional info about char :wink: ) explicitly and not get it implicitly.

But as I wrote - complex representation is not something that bother me much… (and I also wrote simple (?) possibility how to avoid complex info if somebody need it)