Why doesn’t islowercase
work on String? It only works on characters.
Hmm that is a little unexpected. I guess its because a String could have a mix of upper and lowercase characters so you have to check all the characters like this:
julia> all(x->islowercase(x), "hello")
true
conceptually it’s like why does iseven
only work on number not array,
you can simply write
all(islowercase, "hello")
oops, I’m used to writing more complicated conditionals that I default to anonymous functions, good catch.
Let’s say the string is "Hello"
. Most of that string is in lowercase, but one letter isn’t. So there’s no binary answer to the question islowercase("Hello")
.
Intuitively, what we’re thinking of when we ask islowercase("Hello")
is is_all_lowercase("Hello")
. Since all
already exists, it makes sense to do all(islowercase, "Hello")
instead, since it composes existing functionality and isn’t any more complicated or longer that a special function would be.
I think lowercase
working on String
would be reasonable, and that its meaning is clear. Not least because the function lowercase
does work on strings:
julia> lowercase("HelLo")
"hello"
It is a bit odd that the above works, but islowercase(lowercase("HelLo"))
errors.
lowercase("HelLo")
is not ambiguous, while islowercase("HelLo")
is a little. It would be reasonable to expect it to return something like (false,true,true,false,true)
.
Once you’ve seen it, the following syntax is quite natural:
julia> all(islowercase, "Hello")
false
julia> any(islowercase, "Hello")
true
I disagree, that would be very surprising, and something one might expect from islowercase.()
. The issomething
functions always return a scalar Bool
.
IMO, lowercase(str)
and islowercase(str)
seem like a natural pair, while all(islowercase, str)
seems to be more naturally paired with map(lowercase, str)
, in that both functions would need mapping over a string.
It also corresponds to how we talk about it in language, there is no ambiguity about the concepts ‘a lowercase string’, ‘an uppercase string’, and ‘a mixed-case string’.
Issue here
Would such a definition have any contraindications?
julia> islowercase("Hello")
ERROR: MethodError: no method matching islowercase(::String)
Closest candidates are:
islowercase(::AbstractChar) at strings/unicode.jl:324
Stacktrace:
[1] top-level scope
@ c:\Users\sprmn\.julia\v1.8\string2.jl:19
julia> import Unicode.islowercase
julia> islowercase(s::String) = s==lowercase(s)
islowercase (generic function with 2 methods)
julia> islowercase("Hello")
false
That issue is now closed.
I disagree with what feel natural, so I am gonna fork Julia, rebrand it Julie
and implement lowercase(::String)
What definition do you want? Consider:
julia> all(islowercase, "élan")
false
julia> all(islowercase, "élan")
true
(Hint: run collect
on these two strings.)
Consider:
julia> Base.islowercase(s::AbstractString) = s==lowercase(s)
julia> islowercase("1")
true
julia> islowercase('1')
false
which seems inconsistent.
I see.
But this inconsistency seems to derive (also) from the fact that the following functions that are used in the definition of lowercase()
and islowercase()
, give these results:
julia> c2l=Char(ccall(:utf8proc_tolower, UInt32, (UInt32,), '1'))
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
julia> Bool(ccall(:utf8proc_islower, Cint, (UInt32,), UInt32(c2l)))
false
If I don’t make logical mistakes: we have a function that transforms a character into its lowercase form and a function that checking this result says it is not lowercase.
I understood, reading here and there that these topics depend on many “variables” that it is not easy to keep together in a simple way.
As far as I can tell, every widely available string library in every mainstream language does this: there is a lowercase
-like function which converts characters to lowercase if possible (and otherwise leaves them alone), and an islower
-like function that checks specificaly whether a character is a lowercase letter.
(The islower
predicate stems originally from the function of the same name in the C standard library.)
For example, in Python 3 (which doesn’t have a distinction between string and character types):
>>> "1".lower()
'1'
>>> "1".islower()
False
See also the Ruby downcase
— Ruby doesn’t provide an islower
predicate, and instead the standard recommendation seems to be to write a regex. Or Swift’s string.lowercased()
method and char.isLowercased
property. Or the C# String.ToLower()
and Char.IsLower
methods. Or the Go ToLower(str)
and IsLower(char)
functions. Or …
Thanks for taking the time to clear up all of these things.
I had no doubt that the choice was unfounded, even without the fact that all other languages have made the same choice for these two functions.
I still have some curiosities about why.
To simplify, let’s imagine that at some point, in the definition of the lowercase () function, it was decided, for many good reasons, that for characters that do not have the upper-lower correspondent to leave them as they are rather than raise an error or other alternative.
But leaving them as they are could be done with two opposite choices: one that interprets the characters as both upper and lower (this would not have the out of tune I pointed out, but who knows how many other problems it would bring with it); the other (the one actually taken) to consider these characters neither upper nor lower.