AbstractChar (and #26286)

ScottPJones · March 2, 2018, 12:01am

https://github.com/JuliaLang/julia/pull/26286#issuecomment-369771199
@stevengj you said that you didn’t understand the need for CodePoint.

Take a look at the README and the code where I define CodePoint, it is used throughout the package:
https://github.com/JuliaString/Strs.jl

CodePoint allows one to have separate code point types for ASCII, Latin1, UCS2 (BMP only Unicode), and UTF32 (full valid Unicode), as well as other types to represent “raw” 1, 2, 4 byte text values, or binary values.
That way optimized code can be generated for things like isvalid or isascii.

I’d rather though that (for now at least) CodePoint were left out of base, and just have AbstractChar for now, which would be the supertype of both Char from Base and my own CodePoint.

ScottPJones · March 2, 2018, 3:32pm

Alternatively, we could define print to always output UTF-8, and write to output a raw encoded value.

@stevengj That would fit with what I’ve been implementing. Not being about to directly read/write strings with other encodings would be horrible.
Having print defined as outputting UTF-8 encoded strings seems like a good way of having both low-level and high level handling of string I/O.

ScottPJones · March 3, 2018, 12:06am

To do this efficiently while still supporting streams in different encodings, you’d ideally want an ASCIIChar type.
(from at-stevengj)

That was one of the first things I did when designing the Strs.jl package.
It does make things much more efficient.

Topic		Replies	Views
How does Char get stored? Internals & Design strings	7	1722	October 31, 2020
Changes to the representation of Char Internals & Design	14	2851	December 12, 2017
Problems with deprecations of islower, lowercase, isupper, uppercase Internals & Design	179	13314	January 1, 2018
Indexing strings by Unicode code point instead of code unit? General Usage strings	14	2511	January 12, 2024
String indices : byte indexing feels wrong New to Julia strings , unicode	18	1411	December 5, 2023

AbstractChar (and #26286)

Related topics