I have been meaning to speed up some of the non-bonded force calculation in Molly.jl and I had a play with it today, but found I hit a ceiling.

I have extracted a simplistic molecular dynamics simulation of a Lennard Jones fluid as a MWE and would appreciate any tips on how to make it faster. The code and current benchmark results are below. I’m sure there’s some SIMD magic and the like that I am missing. I know there are no periodic boundary conditions or thermostat - it’s meant to be a mimimal example rather than a meaningful simulation.

I am not looking for algorithmic improvements like neighbour lists here, that would be a separate optimisation.

Bonus points would go to anyone who could make it run fast on the GPU and/or with Zygote. I have got these to work, but with severe performance hits compared to the below. And also bonus points for keeping the `ljforce!`

function relatively easy-to-read; the idea is that people can define their own arbitrary functions of this type.

```
using StaticArrays
using BenchmarkTools
natoms = 100
nsteps = 1000
mutable struct Vec3D{T} <: FieldVector{3, T}
x::T
y::T
z::T
end
# The force due to a Lennard-Jones potential
@fastmath function ljforce!(accs, coords, i, j)
ϵ = 0.2
σ = 1.0
dx = coords[j].x - coords[i].x
dy = coords[j].y - coords[i].y
dz = coords[j].z - coords[i].z
r2 = dx * dx + dy * dy + dz * dz
six_term = (σ ^ 2 / r2) ^ 3
f = min(((24 * ϵ) / r2) * (2 * six_term ^ 2 - six_term), 100.0)
accs[i].x -= f * dx
accs[i].y -= f * dy
accs[i].z -= f * dz
accs[j].x += f * dx
accs[j].y += f * dy
accs[j].z += f * dz
return accs
end
# Calculate all against all forces
function forces(coords)
natoms = size(coords, 1)
accs = [Vec3D{Float64}(0.0, 0.0, 0.0) for _ in 1:natoms]
for i in 1:natoms
for j in 1:(i - 1)
ljforce!(accs, coords, i, j)
end
end
return accs # Assume mass is 1 so F = a
end
# Velocity-free Verlet integration
function simulate(natoms, nsteps)
coords = [Vec3D{Float64}(rand(3)...) for _ in 1:natoms] .* 10.0
# Storing previous coordinates imitates velocity
coords_last = coords .+ [Vec3D{Float64}(rand(3)...) for _ in 1:natoms] .* 0.01
for i in 1:nsteps
accs = forces(coords)
coords_next = 2 .* coords .- coords_last .+ accs .* 0.01
coords_last = coords
coords = coords_next
end
return coords
end
```

On my machine:

```
accs = [Vec3D{Float64}(0.0, 0.0, 0.0) for _ in 1:natoms]
coords = [Vec3D{Float64}(rand(3)...) for _ in 1:natoms]
@benchmark ljforce!($accs, $coords, 10, 20)
```

```
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 10.099 ns (0.00% GC)
median time: 10.200 ns (0.00% GC)
mean time: 10.250 ns (0.00% GC)
maximum time: 81.601 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
```

and

```
@benchmark simulate($natoms, $nsteps)
```

```
BenchmarkTools.Trial:
memory estimate: 7.86 MiB
allocs estimate: 203204
--------------
minimum time: 43.903 ms (0.00% GC)
median time: 45.272 ms (0.00% GC)
mean time: 45.331 ms (0.99% GC)
maximum time: 48.995 ms (5.27% GC)
--------------
samples: 111
evals/sample: 1
```

[Edited to interpolate the variables during benchmarking.]