UTF8 space/difficulty tradeoffs

This is something that has always concerned me about the “UTF-8 is always best” mentality, text from the languages of the vast majority of the world’s population generally takes up 50% to 200% more space encoded with UTF-8 compared to their older national character sets (compared to UTF-16, which is the same size or only 100% larger, not 200%).

This is correct, but another perspective on this issue is that space is usually cheap compared to programmer time and the cost of bugs, so economizing on the latter by choosing a single representation is reasonable in many circumstances.

1 Like

Actually, that’s one big reason I think that what I’m working on would be much better for the Julia community.
There’s a constant stream of bugs related to indexing into UTF-8 code units.