Write - accented characters take extra byte

Hello,

When calling write with a string containing accented characters, I realise that 2 bytes are returned.

E.g:

julia> write(stdout, "ñ")
ñ2

julia> write(stdout, "n")
n1

This is an issue when the output is expecting a given length, for example HTTP parsing, relying on Sockets.

How can this be overcome?

Checking Julia code base, I see that ccall with chars is being used.

This is just standard UTF8. Not all characters are one byte.

6 Likes

Ah yes, makes sense indeed!

You can use sizeof to ask for the number of bytes required to write the string. length returns the number of characters.

Ah perfect, I was using transcode(UInt8, string).
No unnecessary conversion, with your suggestion!

Many years ago my phone company charged per sms, where each sms had a fixed size. I discovered that while my phone indicated the number of characters used, the phone company charged by “byte”. I was careful not to omit accents then and typically maxed the allowed limit as indicated by my phone. As a result I exceeded the number of allowed bytes and ended up being charged for 2 sms for every one I had sent. Those were international rates too. Ah the naughties. :cry:

4 Likes

then “boomers” complained about kids writing “hru” and similar atrocities, haha

1 Like