Why so complex representation of a char?

peters · November 14, 2018, 1:34am

You are mentioning the following question, right? If so, I did not propose to lose any information about the UTF-8 details. Instead, what I proposed is just to drop some superfluous words e.g. “Category” from:
‘a’: ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

I think I can raise a question in the same way you asked. What is the upside of printing the superfluous word over and over, whenever users type characters directly in REPL? No need for any immediate change. Let’s see how many users support the idea of more concise representation for a character.

chakravala · November 14, 2018, 5:36am

Whether to leave out the word category or not seems like a trivial issue.

It really doesn’t matter whether we have that word or not, and I’m not sure it’s really worth debating either.

Sukera · November 14, 2018, 5:44am

Looking at this from the perspective of someone not already familiar with Unicode, if you don’t know how ASCII works and that Unicode has letter categories, how would you ever find out what those mysterious U+xxxx and Ll mean? It’s not at all clear to me that U+xxxx in your proposed representation is not a part of ASCII: ‘a’: ASCII U+0061 (Ll: Letter, lowercase). Sure it’s more information for someone familiar with Unicode, but I think the increased clarity is better when someone’s not familiar with the topic.

Tamas_Papp · November 14, 2018, 6:33am

When characters are used in isolation, it is reasonable to assume that detailed information is desired. Note that when they are used in collections or to form strings, only the character is printed. Eg

julia> 'f'
'f': ASCII/Unicode U+0066 (category Ll: Letter, lowercase)

julia> collect("foo")
3-element Array{Char,1}:
 'f'
 'o'
 'o'

julia> "foo"
"foo"

Liso · November 14, 2018, 6:47am

I did not see any problem with complex representation before!

It is not just when user type character to REPL:

julia> a_or_b() = rand(Bool) ? 'a' : 'b';

julia> a_or_b()
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

If you are not familiar with Unicode why would you need info about category?

And if you need to know additional information about character why not use some special function(s)?

One could for example use @tkf’s nice hack:

julia> import PyCall

julia> unicodename(a::Char) = PyCall.pyimport("unicodedata")[:name](string(a));

DNF · November 14, 2018, 7:31am

Yeah, there’s always quite a bit of meta information in these outputs, so it’s not a big deal to me.

But it does seem like something that could be configurable. I’ve seen some requests to mimic Matlab’s format command when printing numbers:

>> format short
>> a
a =
    0.0853

>> format long
>> a
a =
   0.085307484729089

which seems neat. Having an option like that, also for Chars, could be quite handy.

peters · November 15, 2018, 4:27am

Yes nothing serious, but I was surprised at all the inputs I received on my small suggestion.

peters · November 15, 2018, 4:43am

@Liso. Thanks for being with me. Yes, I like the Python approach.
Julia doesn’t have to bother its users (particularly beginners like me) with all the details. In most cases, users may not need all the details and when they really need the details, they can just call a function to spit out all the details. Explicit is better than implicit but also concise is better than verbose.

Liso · November 15, 2018, 7:30am

Explicit is better than implicit - I read that it is better to ask for some behavior (for example additional info about char ) explicitly and not get it implicitly.

But as I wrote - complex representation is not something that bother me much… (and I also wrote simple (?) possibility how to avoid complex info if somebody need it)

Topic		Replies	Views
Accessing the category of a Char General Usage question , unicode	4	312	August 13, 2023
Steven Johnson's #19847 (more verbose multi-line display for Char) Internals & Design	1	658	January 4, 2017
String conversion from Symbol with Unicode does not yield a string, which is intended to be the same New to Julia question , bug	6	767	December 5, 2020
Tab completion of \uXXXX in the REPL? Internals & Design unicode	23	4375	January 12, 2024
What is difference between "a" and 'a'? New to Julia question , strings	6	1137	October 6, 2019

Why so complex representation of a char?

Related topics