I have to allocate many of these structs. In C, I would allocate them all at once in an array, instead of using malloc to allocate each one individually. In Julia, if I create a vector of them, the vector contains pointers to the structs instead of the structs themselves:
julia> b = Vector{Foo}(undef, 1000)
julia> Base.summarysize(b)
8040
If I use a non-mutable struct instead, the size of b is 16040, indicating that the vector contains the structs themselves, not pointers to them.
I understand this is happening because of the mutable struct, but can it be changed? Can I allocate all of them at once?
You could in theory Libc.malloc the structs yourself & unsafe_load the instances, at the cost of having to juggle lifetimes manually, but I wouldn’t recommend it. What’s the point of using a language that models mutability if we’re falling back to manually tracking everything ourselves again?
For mutable structs themselves, you’ll (generally) have to trust julia to do well, though the question is if you really need mutable structs in the first place. If they’re immutable and isbits (i.e. they don’t contain pointers/mutable data), the struct will be allocated inline in the array.
Immutable structs may offer better performance even if you need to update a subset of data fields. See a benchmark here:
using BenchmarkTools
mutable struct MutFoo
a::Int
b::Int
end
function double_b!(v::Vector{MutFoo})
for w in v
w.b *= 2
end
v
end
struct ImmutFoo
a::Int
b::Int
end
function double_b!(v::Vector{ImmutFoo})
for i in eachindex(v)
v[i] = ImmutFoo(v[i].a, v[i].b * 2)
end
v
end
println("Benchmarking vector of mutable objects")
const mut_vec = [MutFoo(0, 0) for i in 1:10^6]
@btime double_b!(mut_vec)
println("Benchmarking vector of immutable objects")
const immut_vec = [ImmutFoo(0, 0) for i in 1:10^6]
@btime double_b!(immut_vec)
Result on my laptop:
Benchmarking vector of mutable objects
2.907 ms (0 allocations: 0 bytes)
Benchmarking vector of immutable objects
1.164 ms (0 allocations: 0 bytes)
Their approach basically consists in storing the struct’s fields in different arrays. I also tried that. The problem here is that this breaks cache locality, and in my application it really slows the program down. Nice to know though.
It often improves cache locality and cache efficiency, i.e. whenever you’re only accessing a subset of the fields, no need to memory bandwidth and cache capacity on those fields you aren’t using.
Also, no need to waste space on padding bytes (ever) with StructArrays.jl.
It also is much more SIMD friendly. E.g., here:
julia> function double_b!(v::AbstractVector{ImmutFoo})
@inbounds for i in eachindex(v)
v[i] = ImmutFoo(v[i].a, v[i].b * 2)
end
v
end
double_b! (generic function with 3 methods)
julia> sa_vec = StructArray(immut_vec);
julia> @btime double_b!($sa_vec);
264.960 ÎĽs (0 allocations: 0 bytes)
julia> @btime $sa_vec.b .*= 2;
266.485 ÎĽs (0 allocations: 0 bytes)
It is as fast as working with the b vector directly, as that is essentially what happens (the compiler optimizes away the loads and stores to sa_vec.a).
So whether it helps or hurts is workload dependent.
Mostly with big structs where you access all the fields and do not iterate through them quickly, or do iterate through them in random orders so that cache lines can’t get reused, are StructArrays likely to be bad for cache.