Yes. For example, even ANSI Latin1 is not really fully compatible with Unicode when it comes to uppercase, even though the characters present in ANSI Latin1 are a pure subset of Unicode, because there are 2 characters whose uppercase versions are not present in Latin 1.
julia> uppercase("µ")[1]
ERROR: Base.uppercase has been moved to the standard library package Unicode.
Restart Julia and then run `using Unicode` to load it.
Stacktrace:
[1] error(::Function, ::String, ::String, ::String, ::String, ::String, ::String) at ./error.jl:42
[2] #uppercase#954(::NamedTuple{(),Tuple{}}, ::Function, ::String, ::Vararg{String,N} where N) at ./deprecated.jl:139
[3] uppercase(::String, ::Vararg{String,N} where N) at ./deprecated.jl:139
[4] top-level scope
^^^ That’s what I think should be changed, uppercase
(and really all of the other functions moved to Unicode)
existed in C for decades before Unicode came along.
It just seems for fitting that Base should just have a generic fallback, that gives an error if the function has not been extended for a particular string type, and say that you need to do using Unicode
to get those extensions for the Base String
type.
That will give a lot of flexibility for adding optimized versions as other string types are added in packages, such as for the ones in LegacyStrings.jl
julia> '\ub5'
'µ': Unicode U+00b5 (category Ll: Letter, lowercase)
julia> '\uff'
'ÿ': Unicode U+00ff (category Ll: Letter, lowercase)
julia> Base.Unicode.uppercase("ÿ")[1]
'Ÿ': Unicode U+0178 (category Lu: Letter, uppercase)
julia> Base.Unicode.uppercase("µ")[1]
'Μ': Unicode U+039c (category Lu: Letter, uppercase)