Write - accented characters take extra byte


When calling write with a string containing accented characters, I realise that 2 bytes are returned.


julia> write(stdout, "ñ")

julia> write(stdout, "n")

This is an issue when the output is expecting a given length, for example HTTP parsing, relying on Sockets.

How can this be overcome?

Checking Julia code base, I see that ccall with chars is being used.

This is just standard UTF8. Not all characters are one byte.


Ah yes, makes sense indeed!

You can use sizeof to ask for the number of bytes required to write the string. length returns the number of characters.

Ah perfect, I was using transcode(UInt8, string).
No unnecessary conversion, with your suggestion!

Many years ago my phone company charged per sms, where each sms had a fixed size. I discovered that while my phone indicated the number of characters used, the phone company charged by “byte”. I was careful not to omit accents then and typically maxed the allowed limit as indicated by my phone. As a result I exceeded the number of allowed bytes and ended up being charged for 2 sms for every one I had sent. Those were international rates too. Ah the naughties. :cry:


then “boomers” complained about kids writing “hru” and similar atrocities, haha

1 Like