Why is Julia allocating when accessing these structs?

Been struggling to figure out why accesses to these struct fields are causing memory allocations. Here is my test code:

module AllocationsTest

struct SubStruct
    a::Float64
    b::Float64
end

struct AllocationsStruct
    arr::Array{SubStruct}
end

function make()
    subStructArr = Array{SubStruct}(undef, 1000)

    for i = eachindex(subStructArr)
        subStructArr[i] = SubStruct(1.0, 2.0)
    end

    return AllocationsStruct(subStructArr)
end

function access(allocStruct::AllocationsStruct)
    return allocStruct.arr[1].a
end

function accessInLoop(allocStruct::AllocationsStruct)
    s = 0
    for i = eachindex(allocStruct.arr)
        s += allocStruct.arr[i].a
    end
    return s
end

end

I thought this might require some more type annotation to inform the compiler, but even annotating according to the Julia performance tips docs in as many places as I can think of (and making the code far more verbose in the process), the allocations are still there:

module AllocationsTest

struct SubStruct{T<:Float64}
    a::T
    b::T
end

struct AllocationsStruct{T<:Array{SubStruct{Float64}}}
    arr::T
end

function make()
    subStructArr = Array{SubStruct{Float64}}(undef, 1000)

    for i = eachindex(subStructArr)
        subStructArr[i] = SubStruct{Float64}(1.0, 2.0)
    end

    return AllocationsStruct{Array{SubStruct{Float64}}}(subStructArr)
end

function access(allocStruct::AllocationsStruct{Array{SubStruct{Float64}}})
    return allocStruct.arr[1].a
end

function accessInLoop(allocStruct::AllocationsStruct{Array{SubStruct{Float64}}})
    s = 0
    for i = eachindex(allocStruct.arr)
        s += allocStruct.arr[i].a
    end
    return s
end

end

Creating the struct with make and then running the accessInLoop function on it results in around 1.5k separate allocations totaling 39kb. Even the single access function causes one 32 byte allocation. Why?

Running on Julia 1.8.2

I don’t think your function is type stable, causing a box allocation. Here’s a fix:

Also, for now, try the following:

Array{T} is not a concrete type. Array{T,1} (or equivalently Vector{T}) is. You are missing this type parameter, so the field is abstractly typed.

In your second, parametric version you almost get there except you write

If you’d instead written return AllocationsStruct(subStructArr), the type would have been correctly inferred.

But the preceding comment about type stability after s = 0 also holds.

4 Likes

I was able to finally get this example to not allocate on every iteration of the accessInLoop

module AllocationsTest

struct SubStruct{T<:Float64}
    a::T
    b::T
end

struct AllocationsStruct{Q<:SubStruct, T<:Vector{Q}}
    arr::T
end

function make()
    subStructArr = Vector{SubStruct}(undef, 1000)

    for i = eachindex(subStructArr)
        subStructArr[i] = SubStruct(1.0, 2.0)
    end

    return AllocationsStruct{SubStruct{Float64}, Vector{SubStruct{Float64}}}(subStructArr)
end

function access(allocStruct::AllocationsStruct)
    return allocStruct.arr[1].a
end

function accessInLoop(allocStruct::AllocationsStruct)
    s::Float64 = 0
    for i = eachindex(allocStruct.arr)
        s += allocStruct.arr[i].a
    end
    return s
end

end

However, I found that doing return AllocationsStruct{SubStruct{Float64}, Vector{SubStruct{Float64}}}(subStructArr) in make was necessary to make this happen.

In my actual code, I have a large struct with many fields and a variety of types. Is there any way for Julia to infer the correct types without the explicit text duplication of writing it all out in the initialization of the struct when I have told Julia what the types will be when I defined the struct’s fields in the struct block?

There isn’t a benefit to subtyping concrete types (like Float64) in type parameters. This would have been much cleaner (and equally performant) as

struct SubStruct
    a::Float64
    b::Float64
end

struct AllocationsStruct
    arr::Vector{SubStruct}
end

and now you don’t need to specify the computed parameters everywhere. Recall that the reason your very first attempt didn’t work well was because you used Array{SubStruct} (which is incompletely specified) rather than Vector{SubStruct} (which is completely specified) in a struct field.

But I’m suspecting this may not completely answer your questions in your actual use case…

3 Likes

Sorry - I guess I conflated that with the second half of your original response. Making that one change does indeed remove all allocations for the accessInLoop call. Thank you!