How is memory allocated for a vector of structs?

Say we have a mutable struct like this:

mutable struct Foo
    a::Int
    b::Int
end

I have to allocate many of these structs. In C, I would allocate them all at once in an array, instead of using malloc to allocate each one individually. In Julia, if I create a vector of them, the vector contains pointers to the structs instead of the structs themselves:

julia> b = Vector{Foo}(undef, 1000)
julia> Base.summarysize(b)
8040

If I use a non-mutable struct instead, the size of b is 16040, indicating that the vector contains the structs themselves, not pointers to them.

I understand this is happening because of the mutable struct, but can it be changed? Can I allocate all of them at once?

I don’t think the behavior can be changed. For immutable objects, you can emulate mutation with Accessors.jl.

Thanks. Is there a way to efficiently allocate space for many such structs, or should I just trust Julia will do it well?

You could in theory Libc.malloc the structs yourself & unsafe_load the instances, at the cost of having to juggle lifetimes manually, but I wouldn’t recommend it. What’s the point of using a language that models mutability if we’re falling back to manually tracking everything ourselves again?

For mutable structs themselves, you’ll (generally) have to trust julia to do well, though the question is if you really need mutable structs in the first place. If they’re immutable and isbits (i.e. they don’t contain pointers/mutable data), the struct will be allocated inline in the array.

1 Like

Immutable structs may offer better performance even if you need to update a subset of data fields. See a benchmark here:

using BenchmarkTools

mutable struct MutFoo
    a::Int
    b::Int
end

function double_b!(v::Vector{MutFoo})
    for w in v
        w.b *= 2
    end
    v
end

struct ImmutFoo
    a::Int
    b::Int
end

function double_b!(v::Vector{ImmutFoo})
    for i in eachindex(v)
        v[i] = ImmutFoo(v[i].a, v[i].b * 2)
    end
    v
end

println("Benchmarking vector of mutable objects")
const mut_vec = [MutFoo(0, 0) for i in 1:10^6]
@btime double_b!(mut_vec)

println("Benchmarking vector of immutable objects")
const immut_vec = [ImmutFoo(0, 0) for i in 1:10^6]
@btime double_b!(immut_vec)

Result on my laptop:

Benchmarking vector of mutable objects
  2.907 ms (0 allocations: 0 bytes)
Benchmarking vector of immutable objects
  1.164 ms (0 allocations: 0 bytes)
3 Likes

Another approach is

4 Likes

Their approach basically consists in storing the struct’s fields in different arrays. I also tried that. The problem here is that this breaks cache locality, and in my application it really slows the program down. Nice to know though.

1 Like

I get

julia> @btime double_b!($mut_vec);
  2.071 ms (0 allocations: 0 bytes)

julia> @btime double_b!($immut_vec);
  634.991 ÎĽs (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.10.3
Commit 0b4590a550 (2024-04-30 10:59 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 28 Ă— Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz

It often improves cache locality and cache efficiency, i.e. whenever you’re only accessing a subset of the fields, no need to memory bandwidth and cache capacity on those fields you aren’t using.
Also, no need to waste space on padding bytes (ever) with StructArrays.jl.

It also is much more SIMD friendly. E.g., here:

julia> function double_b!(v::AbstractVector{ImmutFoo})
           @inbounds for i in eachindex(v)
               v[i] = ImmutFoo(v[i].a, v[i].b * 2)
           end
           v
       end
double_b! (generic function with 3 methods)

julia> sa_vec = StructArray(immut_vec);

julia> @btime double_b!($sa_vec);
  264.960 ÎĽs (0 allocations: 0 bytes)

julia> @btime $sa_vec.b .*= 2;
  266.485 ÎĽs (0 allocations: 0 bytes)

It is as fast as working with the b vector directly, as that is essentially what happens (the compiler optimizes away the loads and stores to sa_vec.a).

So whether it helps or hurts is workload dependent.
Mostly with big structs where you access all the fields and do not iterate through them quickly, or do iterate through them in random orders so that cache lines can’t get reused, are StructArrays likely to be bad for cache.

4 Likes