Substring function?

StefanKarpinski · February 19, 2022, 3:26pm

A significant difference from UTF-8 is that there is no such thing as malformed UTF-16, only unpaired surrogate code units, which still have code points, just not ones that correspond to valid Unicode characters. So every UTF-16 string can be iterated as code points, you might just get the code point for an unpaired surrogate if the string is invalid. Compare that with UTF-8 where some byte sequences just don’t follow the right structure at all.

(Another way to put this is that every UTF-16 sequence, valid or invalid, can be represented as a sequence of code points using WTF-8, which is an extension of UTF-8 allowing surrogate pair code points.)

Topic		Replies	Views
Julia substring return empty string New to Julia	8	1012	April 23, 2019
SubString doesn't work with unicode New to Julia question , unicode	13	1438	June 17, 2022
Counting special characters ü, å, ø, etc General Usage strings , unicode	11	759	April 1, 2022
String slicing General Usage	3	2714	October 25, 2018
Any difference between : or , in the SubString() method? New to Julia	2	280	September 24, 2020

Substring function?

Related topics