Hi,
I’m working on speeding up particle simulation using multi-threading and I’ve come across odd allocations in the multi-threaded code that I don’t fully understand and much less know how to get rid of.
Here’s a much reduced MWE:
There’s a main loop that does the time steps and at each timestep forces need to be computed for all cells (+interactions that I neglect here). The force computation does quite a bit of 2D SVector
juggling
but does not allocate.
However as soon as I add more threads, the allocations increase.
using StaticArrays
mutable struct Particle
force::SVector{2, Float64}
end
function forces!(particle)
# Some bogus computations here
a = SA[1.0, 1.0]
A = SA[1.0 0.0; 0.0 1.0]
for i = 1:1000
a = A*(a-particle.force)
end
particle.force = a
return nothing
end
function computeforces!(particles, N)
Threads.@threads for n = 1:N
forces!(particles[n])
end
end
function main(tmax = 1000)
particles = [Particle(rand(SVector{2, Float64})) for i=1:100]
for n = 1:tmax
N = length(particles)
computeforces!(particles, N)
end
end
using BenchmarkTools
@btime main(1000)
$ export JULIA_NUM_THREADS=1
$ julia mwe_threading_allocs.jl
432.846 ms (8101 allocations: 785.25 KiB)
$ export JULIA_NUM_THREADS=2
$ julia mwe_threading_allocs.jl
226.731 ms (13196 allocations: 1.45 MiB)
$ export JULIA_NUM_THREADS=4
$ julia mwe_threading_allocs.jl
133.044 ms (23504 allocations: 2.85 MiB)
In this simple example the speedup is still quite good but in our real calculations with more threads, GC time becomes significant with > 20%.
Has anyone else had that problem and is there any way around this?