AbstractChar (and #26286)


@stevengj you said that you didn’t understand the need for CodePoint.

Take a look at the README and the code where I define CodePoint, it is used throughout the package:

CodePoint allows one to have separate code point types for ASCII, Latin1, UCS2 (BMP only Unicode), and UTF32 (full valid Unicode), as well as other types to represent “raw” 1, 2, 4 byte text values, or binary values.
That way optimized code can be generated for things like isvalid or isascii.

I’d rather though that (for now at least) CodePoint were left out of base, and just have AbstractChar for now, which would be the supertype of both Char from Base and my own CodePoint.


Alternatively, we could define print to always output UTF-8, and write to output a raw encoded value.

@stevengj That would fit with what I’ve been implementing. Not being about to directly read/write strings with other encodings would be horrible.
Having print defined as outputting UTF-8 encoded strings seems like a good way of having both low-level and high level handling of string I/O.


To do this efficiently while still supporting streams in different encodings, you’d ideally want an ASCIIChar type.
(from at-stevengj)

That was one of the first things I did when designing the Strs.jl package.
It does make things much more efficient.