Structs containing isbits fields of type Union use the isbits Union structs optimization. I was surprised to learn that
The type tag byte for a type’s Union field is stored directly after the field’s computed Union memory.
Take the following example (which is a simplification of my real problem with different types, but same memory layout):
struct MyStruct
a::Union{Int32, UInt32}
b::Union{Int32, UInt32}
c::Union{Int32, UInt32}
d::Union{Int32, UInt32}
end
where the Unions effectively double the needed space, as this kind of gets transformed into
struct MyStruct
a::NTuple{4, UInt8}
a_type_tag::UInt8
b::NTuple{4, UInt8}
b_type_tag::UInt8
c::NTuple{4, UInt8}
c_type_tag::UInt8
d::NTuple{4, UInt8}
d_type_tag::UInt8
end
adding three bytes of padding to each (Union) field resulting in 32 bytes.
Instead, I would have expected that all type tags are appended at the end of the struct resulting in something like
struct MyStruct
a::NTuple{4, UInt8}
b::NTuple{4, UInt8}
c::NTuple{4, UInt8}
d::NTuple{4, UInt8}
a_type_tag::UInt8
b_type_tag::UInt8
c_type_tag::UInt8
d_type_tag::UInt8
end
with 20 bytes. And this ratio of needing 8/5 of the memory isn’t even the worst case as the Union fields could also have more demanding alignment constraints going towards larger regular types or even SIMD types (I think a memory efficiency of 65/128 is the limit with AVX 512, but anyway, it can get close to 50%).
To achieve optimum packing, fields can be ordered by their alignment constraints. This is impossible when the type tags immediately follow the Union field as can be seen above.
I think that the “always at the end of the struct” proposal would be no loss in generality. To achieve the current behavior, you could move each Union field into its own struct and use that struct instead as the field’s type, so
struct EitherInt32
val::Union{Int32, UInt32}
end
struct MyStruct
a::EitherInt32
b::EitherInt32
end
would effectively result in
struct MyStruct
a::NTuple{4, UInt8}
a_type_tag::UInt8
b::NTuple{4, UInt8}
b_type_tag::UInt8
end
In general we can’t reorder fields, or influence padding, because we want to be compatible to C, but in this case I think we could, as C doesn’t have these kind of unions.
So why are the type tags immediately following the union fields? Is it to not need to look up an offset? Is it for better data locality and therefore higher cache efficiency? Although if it is the latter, I would assume (but measurements would be needed), that the smaller memory footprint on average is better for caching efficiency than the slightly higher locality.
Is there a possibility to change the DataType’s layout at runtime with some API to define the proposed ordering manually from Julia? This would probably be internal and would need tweaking on updates which would be fine in my case.
Assume I would be doing everything manually, e.g. use a non-Union placeholder type instead of the Union and doing the typecast on every access (the data structure will be read-only when performance matters). Would this have any performance disadvantages compared to the available union optimization, e.g. any optimizations which this misses out?
