Why does using structs increase allocations (and calculation time)

Thomas_Van_Giel · March 30, 2023, 12:37pm

In an effort to make my code more easily extensible, I started using more structs containing the important arrays/data as parameters for my simulations, instead of the individual arrays.

Initially my old code looked like this:

using LoopVectorization

function simulate(h::HOI_ecosystem, dt::Real=1/mean(h.R)/10000)
   # do some stuff before
   for timestep in 1:10^7
      update_n!(h, dt)
   end
   # do some stuff after
end

function update_n!(A::Matrix{Float64}, n::Vector{<:Real},
    m::Matrix{<:Real}, R::Vector{<:Real}, d::Vector{<:Real}, dt::Float64)
    N = length(n)
    @avx for i = 1:N
        dni = R[i] + d[i]*n[i]
        for j = 1:N
            dni += A[i,j]*m[i,j]*n[j]
        end
        n[i] = n[i]*(1+dni*dt)
    end
    return n
end

Running this code many times during my simulation takes around 0.7 seconds, with 361 k allocations, 17MiB.

Then I grouped all the vectors and matrices into a struct. Now my code looks as follows:

First I have the struct single_HOI_ecosystem

struct single_HOI_ecosystem <: one_HOI_ecosystem
    N::Int
    n::Vector{<:Real}
    m::Matrix{Float64}
    A::Matrix{Float64}
    R::Vector{<:Real}
    d::Vector{<:Real}
end

And a function that does calculations:

using LoopVectorization

function simulate(h::HOI_ecosystem, dt::Real=1/mean(h.R)/10000)
   # do some stuff before
   for timestep in 1:10^7
      update_n!(h, dt)
   end
   # do some stuff after
end

function update_n!(h::HOI_ecosystem, dt::Real=1/mean(h.R)/10000)
    N = length(h.n)
    n = h.n
    m = h.m
    A = h.A
    R = h.R
    d = h.d
    @avx for i = 1:N
        dni = R[i] + d[i]*n[i]
        for j = 1:N
            dni += A[i,j]*m[i,j]*n[j]
        end
        n[i] = n[i]*(1+dni*dt)
    end
    return n
end

This is running around 6 times slower than my old code. Using it in exactly the same way as before gives a runtime of around 5 seconds, 11.35 M allocations and 252.93 MiB. The stuff before and after hasn’t changed (and doesn’t take up much time).

The reason why I want to put it in structs is because I have a few very similar functions with slightly different patterns that can easiliy be put together using structs.

jmair · March 30, 2023, 12:52pm

Thomas_Van_Giel:

struct single_HOI_ecosystem <: one_HOI_ecosystem
    N::Int
    n::Vector{<:Real}
    m::Matrix{Float64}
    A::Matrix{Float64}
    R::Vector{<:Real}
    d::Vector{<:Real}
end

This is creating containers with abstract types unfortunately. Dealing with abstract types require heap allocations (as you don’t know how much memory is need ahead of time). You could do:

struct single_HOI_ecosystem{T<:Real} <: one_HOI_ecosystem
    N::Int
    n::Vector{T}
    m::Matrix{Float64}
    A::Matrix{Float64}
    R::Vector{T}
    d::Vector{T}
end

Modifying if you need some more types. The general rule of thumb is that the types should be either concrete in the struct or have a parametric type which should also be concrete. You should check what T is when creating the struct to make sure it is concrete.

Also, use @code_typed to check for type instabilities , Unions, Anys or abstract types that can’t be inferred.