Understanding and avoiding allocations with StructArrays

Hi all,

I’m working on a piece of code where I have triangles which are made of points. Each point has a number of properties such as position, velocity, force…
In some part of the code I need to calculate the force for each triangle, and assign it to the triangle points.

I have a Triangle struct, a Point struct and I make use of the StructArray package.

For this, I have coded the following MWE:


using StaticArrays
using StructArrays

# This allows me to have a "view" on the points
# Note: it is similar to the LazyRow type but with a minor tweak to allow three indexes
struct StructArrayView{T,Ti}
    structarray::T
    idxs::Ti
end

import Base.getproperty, Base.setproperty!

# I need to overload these methods to access the points in the triangle and write the force to them
Base.@propagate_inbounds function Base.getproperty(c::StructArrayView, s::Symbol)
    return getproperty(getfield(c, 1), s)[getfield(c, 2)]
end
Base.@propagate_inbounds function Base.setproperty!(c::StructArrayView, s::Symbol, val)
        getproperty(getfield(c, 1), s)[getfield(c, 2)] = val
    return
end


struct Point{T}
    idx::Int 
    position::SVector{3,T}
    velocity::SVector{3,T}
    mass::T
    force::SVector{3,T}
end

struct Triangle{T, Tpoints}
    idx::Int
    points::StructArrayView{Tpoints, SVector{3,Int}} # To have a view on the points contained in the triangle
    rho::T
    dt::T
end


npoints = 10000

ntris = 10000

points = StructArray(Point{Float64}(i, rand(3), zeros(3), 1., zeros(3)) for i in 1:npoints)

tris = StructArray(Triangle(i, StructArrayView(points, SVector{3}(rand(1:npoints) for _ in 1:3)), 2.0, 3.0) for i in 1:ntris)

# Dummy force calculation
function force!(triangles)    

    for t in triangles
        # The custom StructArrayView allows to get the positions like this...
        pos = t.points.position

        val = SVector{3}(pos[i] for i in 1:3)

         # ... and to write the forces like this
        t.points.force -= val
    end

end

force!(tris)

# Profiling in VSCode... notice the allocations in setproperty!
@profview [force!(tris) for _ in 1:100]

When profiling the code, I get some allocations in the overloaded setproperty!. I don’t understand these allocations (everything should be type stable and inferrable by the compiler). What is even more intriguing is if I manually inline the setproperty! by including it in the force! function:

function force!(triangles)    
    for t in triangles
        pos = t.points.position
        val = SVector{3}(pos[i] for i in 1:3)
        getproperty(getfield(t.points, 1), :force)[getfield(t.points, 2)] -= val
    end
end

The allocations magically disappear!

I’ve been banging my head on this, can someone help? What is causing these allocations?

Thanks!

Ok after some investigations, I’ve actually made a much simpler minimal working example, reproducing the example given in the StructArray package for the LazyRow iteration:


using StructArrays, StaticArrays, BenchmarkTools

struct Foo{T}
    idx::Int 
    vec::SVector{3,T}
end


sarr = StructArray(Foo{Float64}(i, zeros(3)) for i in 1:1000)

val = @SVector [1.,2.,3.]

function testalloc(sarr, val)
    for f in LazyRows(sarr)
        f.vec = val
    end
end

# This function does not allocate, I just replaced the f.vec call by the inlined equivalent.
function testnoalloc(sarr, val)
    for f in LazyRows(sarr)
        getproperty(getfield(f, 1), :vec)[getfield(f, 2)] = val
    end
end


@btime testalloc($sarr, $val)
# 21.776 μs (1489 allocations: 38.89 KiB)
@btime testnoalloc($sarr, $val)
# 1.125 μs (0 allocations: 0 bytes)

So the setproperty! of the LazyRow type of StructArray is responsible for these allocations. However there is still something that bugs me. The getproperty and setproperty! of the LazyRow type are defined as follows:

Base.@propagate_inbounds function Base.getproperty(c::LazyRow, nm::Symbol)
    return getproperty(getfield(c, 1), nm)[getfield(c, 2)]
end

Base.@propagate_inbounds function Base.setproperty!(c::LazyRow, nm::Symbol, val)
     getproperty(getfield(c, 1), nm)[getfield(c, 2)] = val
     return
end

The allocations are caused by a type instability in the setproperty! function. How come the compiler cannot figure out the type of sarr.vec? Is there a way to avoid these allocations?

Thanks!

I am not the greatest expert here, but since the SVector cannot have its values changed, when doing f.vec = val you are actually creating a new static vector and replacing the current f.vec with the new one. If you are expecting to replace its values, you must use a MVector instead. This example does what you perhaps expect:

struct Foo2{T}
    idx::Int
    vec::MVector{3,T}
end
sarr2 = StructArray(Foo2{Float64}(i, zeros(3)) for i in 1:1000)
val = @SVector [1.,2.,3.]

function testalloc2(sarr, val)
    for f in LazyRows(sarr)
        @. f.vec = val
    end
end
@btime testalloc2($sarr2, $val)

# result:
  994.200 ns (0 allocations: 0 bytes)

I am also not the greatest expert in the inner workings of StructArrays but I find strange that f.vec = val actually works. If instead of LazyRows of StructArray of Foo{T} you were just iterating elements of a Vector of Foo{T}, then such assignment would cause an error because type Foo{T} is not mutable and should not have its fields changed. Are you sure this is having the effect you expect, and that Foo{T} should not be mutable?

1 Like

This is a specificity of the lazy row iteration implemented in StructArrays (see https://github.com/JuliaArrays/StructArrays.jl#lazy-row-iteration). It allows changing the StructArray columns: calling f.vec = val actually calls vector_column_of_vec[i] = val, which is perfectly valid.

Thanks, that would work but I would lose some of the benefits of having vec immutable, which gives a small performance boost in my case. If I fail to find a solution to my allocation issue, I will resort to this.

Maybe, but just pointing that using the SMVector here provides the fastest of the three alternatives (close to the alternative in which you inlined the getproperty function):

julia> include("./test.jl")
  18.221 μs (1489 allocations: 38.89 KiB)
  1.011 μs (0 allocations: 0 bytes) 
  967.647 ns (0 allocations: 0 bytes) # SMVectors

(by the way, the use of “Vector”, without Static arrays at all, provides the same “fast” result.) My guess here is that, for this particular function, you won’t get any benefit in using static arrays. If, however, this is only an initializing step, you probably would like to startup the static arrays within that function, which would be later used for heavy computations being truly static.

1 Like

Thanks for the advice, I will definitely consider this. However it will slightly impact performance in my case (which is more complex than this small MWE), as the workload performed includes many small vector-matrix operations.
That’s why I’m still not giving up on the strange allocation issue.

I am slowly progressing on my road to make these methods allocation-free. Actually, it comes down to a much simpler scenario, which goes like this:

Let’s start with a named tuple:

ntup = (a=rand(Int, 1000), b=rand(Float64, 1000))

Let’s say I want to write a custom function that takes a named tuple made of arrays and sets the ith element of a field to a value.

I am aware that the usual call for this in Julia would be:

ntup[:field][i] = val

However, I want to write a custom (wrapper) function for this. Let’s try with:

function set_ith_element!(ntup::NamedTuple, s::Symbol, i::Int, val)
    ntup[s][i] = val
end

Now let’s use this function to set all the elements of the :a field:

function setwithcustomfun(ntup, val)
    for i in eachindex(ntup.a)
        set_ith_element!(ntup, :a, i, val)
    end
end

Benchmarking this function shows a lot of allocations:

@btime setwithcustomfun($ntup, 5)
# 34.517 μs (1489 allocations: 38.89 KiB)

Whereas setting the values using the traditional way does not allocate:

function setasusual(ntup, val)
    for i in eachindex(ntup.a)
        ntup[:a][i] = val
    end
end
@btime setasusual($ntup, 5)
# 394.751 ns (0 allocations: 0 bytes)

The allocations I see in my code or in StructArrays are due to the same reason, as the setproperty! method is overloaded with a custom function.

I still don’t see how to make this “wrapper function” allocation-free, but I’m getting there…

1 Like

Here is a micro-working example:

julia> f(ntup,field) = ntup[field][1] = 5 
f (generic function with 1 method)

julia> @btime f($ntup,:a)
  24.650 ns (1 allocation: 32 bytes)
5

julia> f(ntup) = ntup[:a][1] = 5
f (generic function with 2 methods)

julia> @btime f($ntup)
  2.023 ns (0 allocations: 0 bytes)

Apparently getindex becomes type-unstable if you pass the field, because each field is of a different type:

julia> @code_typed f(ntup,:a)
CodeInfo(
1 ─ %1 = Base.getfield(ntup, field)::Union{Array{Float64,1}, Array{Int64,1}}
│        Base.setindex!(%1, 5, 1)::Any
└──      return 5
) => Int64

julia> @code_warntype f(ntup,:a)
Variables
  #self#::Core.Compiler.Const(f, false)
  ntup::NamedTuple{(:a, :b),Tuple{Array{Int64,1},Array{Float64,1}}}
  field::Symbol

Body::Int64
1 ─ %1 = Base.getindex(ntup, field)::Union{Array{Float64,1}, Array{Int64,1}}
│        Base.setindex!(%1, 5, 1)
└──      return 5

If both fields are of the same type, the instability disappears:

julia> ntup = (a=rand(Int, 1000), b=rand(Int, 1000))
julia> @code_warntype f(ntup,:a)
Variables
  #self#::Core.Compiler.Const(f, false)
  ntup::NamedTuple{(:a, :b),Tuple{Array{Int64,1},Array{Int64,1}}}
  field::Symbol

Body::Int64
1 ─ %1 = Base.getindex(ntup, field)::Array{Int64,1}
│        Base.setindex!(%1, 5, 1)
└──      return 5

julia> @btime f($ntup,:a)
  2.093 ns (0 allocations: 0 bytes)


1 Like

Thanks, that makes sense!

Based on this, I managed to make it not allocate by using specialization:

ntup = (a=rand(Int, 1000), b=rand(Float64, 1000))

function set_ith_element_specialize!(ntup::NamedTuple, ::Val{s}, i::Int, val) where s
    setindex!(getfield(ntup,s), val, i)
end

function setwithcustomfun_specialize(ntup, val)
    for i in eachindex(ntup.a)
        set_ith_element_specialize!(ntup, Val(:a), i, val)
    end
end

@btime setwithcustomfun_specialize($ntup, 5)
# 276.491 ns (0 allocations: 0 bytes)

Another option is to use a macro:

macro set_ith_element(ntup, s, i, val)
    quote
        $(esc(ntup))[$(esc(s))][$(esc(i))] = $(esc(val))
    end
end

function setwithmacro(ntup, val)
    for i in eachindex(ntup.a)
        @set_ith_element(ntup, :a, i, val)
    end
end

@btime setwithmacro($ntup, 5)
# 278.075 ns (0 allocations: 0 bytes)

This feels a bit “hacky”, but for my particular use case described in the first post, I should be able to write a macro to write the non-allocating expression for me, i.e. @mynoallocmacro t.points.force = val would just replace the expression with getproperty(getfield(t.points, 1), :force)[getfield(c, 2)] = val.

This should work! Thanks for helping with this.

Cool. Another option is to inline the set_ith… function with @inline:

@inline function set_ith_element!(ntup::NamedTuple, s::Symbol, i::Int, val)
    ntup[s][i] = val
end

That solves the allocation problem as well.

Nice! However it doesn’t work for the more complex case given in the first post…

@inline function Base.setproperty!(c::StructArrayView, s::Symbol, val)
    getproperty(getfield(c, 1), s)[getfield(c, 2)] = val
    return
end

@btime force!(tris)
# 833.661 μs (20000 allocations: 1.07 MiB)

But wait a moment… I just removed the return… and suddenly it works!

@inline function Base.setproperty!(c::StructArrayView, s::Symbol, val)
    getproperty(getfield(c, 1), s)[getfield(c, 2)] = val
 end

@btime force!(tris)
# 161.031 μs (0 allocations: 0 bytes)

Julia’s mysterious ways I guess… Is this documented somewhere? I just put the return as good practice…

What about return nothing or simply nothing? Sorry, cannot test it now, I am using the cell phone.

Perhaps here you might still have problems, since you are passing s again. I am not sure if you can guarantee that this gets inlined. I think that using something as ntup[s][i] = val is more guaranteed if using inlining. (yet your specialization solution is perfect and, by the way, alows me to learn quite a few things).

Nope, return nothing, nothing or return all prevent the @inline to work.
Fun fact, the @inline call was actually the very first thing I tried when I noticed that manually inlining got rid of the allocations. I did not suspect that this small return prevented this to work.

Quick update: this seems to be fixed in v1.6#master:

ntup = (a=rand(Int, 1000), b=rand(Float64, 1000))

function f(ntup,field,i,val)
       ntup[field][i] = val
       return
end

@btime f($ntup, :a, 1, 1)
# 1.595 ns (0 allocations: 0 bytes)
1 Like