To make myself more clear: I feel that the bugs in strftime
and strptime
come from a lack of awareness of how character set encodings are used around the world, stemming from this UTF-8
only or centric view.
Also, many other bugs that I’ve seen over the last 3 years in Julia itself, or in many of the packages that deal with strings (such as JSON, CSV, all of the database wrappers) come from either 1) lack of identification of the character set / encoding (such as in the strftime
/strptime
case), assuming everything would be UTF-8
, or 2) issues caused by the complexity of dealing with multi-codeunit encodings such as UTF-8
, such indexing into the middle of UTF-8 sequences, incorrectly specifying the end of a range of characters [lastindex vs. sizeof, for example], etc.
I agree that having an single recommended string type (but not necessarily a single internal string representation!) for most use, especially with the high numbers of Julians who are researchers, professors, mathematicians, scientists of all types but not so many CS types, is a good thing, but it should be something that is easy for them to use, which is why I’ve been working on a UniStr
type that does not have the issues that complex encodings such as UTF-8
have.
It also needs to be able to handle in an easy fashion converting back and forth to other encodings such as UTF-8
, UTF-16
, Cwstring
(i.e. either UTF-16
or UTF-32
depending on platform), taking care of any system conversions (such as with strftime
) to/from the system’s character set / encoding for that function / locale (i.e. LC_TIME
).