Thank you for your comment.
They could be valid EUC-KR
characters but it does not mean they are valid UTF-8
. EUC-KR
is not compatible with UTF-8
except ASCII plane. It bothers people in this culture because EUC-KR
is not compatible with both codepage and encoding. For example, 한
is 0xc7d0
in EUC-KR
, but U+d55c
in Unicode and will be encoded as 0xed9f9c
in UTF-8
.
Currently Julia is not aware of other encodings except Unicode I guess. (at least in base
and stdlib
) Python 3 only supports str
in Unicode yet use different encodings for encode/decode bytes. strftime()
function of time
module in Python 3 returns correct str
. I also want to use Unicode (especially UTF-8
) mainly in Julia. It requires every bytes which are not compatible with Unicode must be transcoded. I think it will be great if packages such as StringEncodings.jl becomes base
or stdlib
.
The fix needs to be applied for platforms (which makes it simpler anyway), and a similar fix needs to be made for strptime as well.
Thank you! I should search wchar_t
version of strptime()
. If I understand correctly, do you meen the problem can also happen in different platforms, not only Windows?