Understanding `sizeof` return values on `Char` / `String`

Can someone explain this behavior of sizeof vs summarysize ?

# 1
# 4
# 4
# 9

When I read the doc

Size, in bytes, of the string str. Equal to the number of code units in str multiplied by the size, in bytes, of one code unit in str.
I understand that in this case sizeof and summarysize should return the same value… What am I missing ?

Some context : I want to convert a Vector of Strings into a Vector of some struct by splitting the strings at some separator, then convert the obtained substrings to more appropriate formats (Char, Int …) if possible.

I am on Julia 1.7.0-rc1

sizeof('z') == 4 because a Char is stored as a 32-bit value (see ?Char). This is required so any Unicode codepoint can fit in a Char.

sizeof("z") == 1 because encoding “z” in UTF-8 takes only one byte.

Base.summarysize('z') == 4 because a Char is a simple value type.

Base.summarysize("z") == 9 because… hum I’m not sure: I thing this counts 8 bytes for the pointer to the region of memory that holds the string, and 1 byte for the string itself. But it should also count some bytes for storing the length of the string?


The reason that this is 9 is that, internally, a String consists both of an array of bytes (UTF-8 code units for the encoded string) and an internal length::Int field and summarysize includes the Int size. sizeof(Int) == 8 on a 64-bit machine, and 1+8 == 9. (Technically, a String object may have an even bigger footprint in memory: not only may it implicitly include a 1-byte NUL terminator for ease of passing to C, but a heap-allocated Julia value can also have a preamble with a type tag and some other info.) In contrast, sizeof only gives you the size of the underlying String data and not the Julia wrappers thereof.