I put together a simple package to simulate many interacting Brownian particles. I’m trying to speed up the force calculation (for Lennard-Jones). Below is the force calculation using cell lists to find neighbors. The part I want to bring your attention to is towards the end (although the rest of the code could be relevant). If it helps, I could link the package or attach more code. I apologize if this is difficult to follow. I’m not sure what code to link here since it’s all interconnected.
function compute_pair_interaction!(lj::LennardJones; period_x::Float64 = -1.0, period_y::Float64 = -1.0)
@use_threads lj.multithreaded for particle in lj.particles
x, y = particle.x, particle.y
i = trunc(Int64, x / lj.cell_list.cell_spacing_x)
j = trunc(Int64, y / lj.cell_list.cell_spacing_y)
@inbounds for di = -1 : 1, dj = -1 : 1
idi = mod(i + di, lj.cell_list.num_cells_x) + 1
jdj = mod(j + dj, lj.cell_list.num_cells_y) + 1
pid = lj.cell_list.start_pid[idi, jdj]
while pid > 0
neighbor = lj.cell_list.particles[pid]
Δx = wrap_displacement(x - neighbor.x; period = period_x)
Δy = wrap_displacement(y - neighbor.y; period = period_y)
Δr² = Δx^2 + Δy^2
σ² = (lj.σ < 0.0 ? particle.R + neighbor.R : lj.σ)^2
cutoff² = (lj.cutoff < 0.0 ? LJ_CONST * σ² : lj.cutoff^2)
if 0.0 < Δr² < cutoff²
coef = lj_coef(lj.ϵ, σ², Δr²)
f_x, f_y = coef * Δx, coef * Δy
#THIS PART SEEMS TO BE SLOW
particle.f_x += f_x
particle.f_y += f_y
if lj.use_newton_3rd
neighbor.f_x -= f_x
neighbor.f_y -= f_y
end
end
pid = lj.cell_list.next_pid[pid]
end
end
end
end
The chunk of code
particle.f_x += f_x
particle.f_y += f_y
if lj.use_newton_3rd
neighbor.f_x -= f_x
neighbor.f_y -= f_y
end
seems to take up about half the computation time (using @benchmark
, the whole function takes about 1 ms). particle
and neighbor
are a custom type Particle
containing all the information for a particle:
mutable struct Particle
ptype::Symbol # particle type
x::Float64 # x position
y::Float64 # y position
θ::Float64 # orientation
f_x::Float64 # x force
f_y::Float64 # y force
t_θ::Float64 # torque in z
R::Float64 # particle radius
γ_trans::Float64 # translational friction
γ_rot::Float64 # rotational friction
D_trans::Float64 # translational diffusivity
D_rot::Float64 # rotational diffusivity
active_force::Union{AbstractActiveForce, Nothing} # active force/self-propulsion
end
I did some tests by commenting out different parts of the calculation.
TEST 1: I commented out
particle.f_x += f_x
particle.f_y += f_y
TEST 2: I commented out
if lj.use_newton_3rd
neighbor.f_x -= f_x
neighbor.f_y -= f_y
end
TEST 3: I commented out both chunks.
Note that I have also tested this without multithreading and get similar speed-up. Now, TEST 1 and TEST 2 both give ~1 ms for @benchmark
. TEST 3, however, gives ~500 \mu\textrm{s} (half the time) for @benchmark
. It seems like they individually don’t contribute a lot to the time but collectively they do. Is this part slow because of my use of mutable structs? There are a few other places where I use a similar struct.field +=
and those don’t seem to contribute nearly as much time. I know that most other codes store particle positions in an array, so if it comes to it, I’ll probably have to rewrite everything, but for the moment I like the organization of Particle
.
Another thing that is a little strange is that I am testing with lj.use_newton_3rd
set as false
. Logically, this means that TEST 1 should be equivalent to TEST 3, but it still takes ~1 ms instead of ~500 \mu\textrm{s}. It only goes down to ~500 \mu\textrm{s} if I replace if lj.use_newton_3rd
with if false
. I’m guessing this has to do with there being a lot of background code being generated with if lj.use_newton_3rd
?