A Python rant about types

According to PEP-381 it is based on UTF-8b.

It seems that this type of solution was analyzed but abandoned in python. See explanation from that PEP:

“… , the approach of escaping each byte XX with the sequence U+0000 U+00XX has the disadvantage that encoding to UTF-8 will introduce a NUL byte in the UTF-8 sequence. As a consequence, C libraries may interpret this as a string termination, even though the string continues. In particular, the gtk libraries will truncate text in this case; other libraries may show similar problems.”

(There are also described some security concerns about supporting everything)