# Learning to optimize

Still trying to build intuition for optimizing Julia code. Here is an MD code I translated from python. It already runs a lot faster than my python version but I feel like it could be faster. I’m more interested in learning optimization techniques than this specific piece of code but I thought this would be a good piece to practice on. When I time the function that does the heavy lifting I’m seeing over 10k allocations. (I’ve set the loop to run just a few iterations for now.) Any ideas for using better style/ optimizing this would be much appreciated. Thanks in advance:

``````struct MD
nParticles::Int64
boxSize::Float64
positions:: Array{Float64,2}
velocities:: Array{Float64,2}

end
using Random
using LinearAlgebra
#using Pkg
using StaticArrays

function initializeParticles(nParticles::Int64,boxSize::Float64,lVecs::Array{Float64,2},random::Bool = false):: MD
positions = zeros(nParticles,2)
velocities = zeros(nParticles,2)
deltaR = 0.5
v0=1.

idx = 0
for i =-boxSize-10:boxSize+10
for j =-boxSize-10:boxSize+10
println(i,j)
println(idx)
vec = i * lVecs[1,:] + j * lVecs[2,:] + [0.6, 0.6]
if random
rng = MersenneTwister(1234)
rVec = rand(rng,2)
vec +=  deltaR * rVec/norm(rVec)
end
if 0 <= vec <= boxSize && 0 <= vec <= boxSize && idx < nParticles
println(i,j)
println(vec)
println(idx)
idx += 1
positions[idx,:] = vec

rng = MersenneTwister(1234)
vVec = rand(rng,2)
velocities[idx,:] = v0 * vVec/norm(vVec)

end
end
end
if idx < nParticles
println("Didn't find all of the requested particles")
println("Requested: ", nParticles)
println("Found: ", idx)
end
MD(nParticles,boxSize,positions,velocities)

end

function forceOnSingleParticle(positions::Array{Float64,2},particle::Array{Float64,1},boxSize::Float64):: Array{Float64,1}
fVec = zeros(2)
modifiedPos = zeros(2)
diffVec = zeros(Float64,2)
for i=1:size(positions,1)
diffVec .= particle - positions[i,:]

if abs(diffVec) > boxSize/2 && abs(diffVec) > boxSize/2
modifiedPos .= positions[i,:] + boxSize * [sign(diffVec) , sign(diffVec)]
elseif abs(diffVec) > boxSize/2
modifiedPos .= positions[i,:] + boxSize * [sign(diffVec) , 0]
elseif abs(diffVec) > boxSize/2
modifiedPos .= positions[i,:] + boxSize * [0 , sign(diffVec)]
else
modifiedPos .= deepcopy(positions[i,:])
end
diffVec .= particle - modifiedPos
distance = norm(diffVec)

if distance > 0.5
fVec .+= 24 * (2/distance^13 - 1/distance^7) * 1/distance * [diffVec, diffVec]
end
end
return fVec

end

function simulate(myMD::MD,dt::Float64,tMax::Float64)
forces = zeros(size(myMD.positions,1),2)
T = 0.4
positions = copy(myMD.positions)
positionsP = copy(myMD.positions)
positionsPP = positions - myMD.velocities * dt
velocities = zeros(Float64,size(positions,1),size(positions,2))
while T > 0.2
for time=0:dt:tMax
#println(time)
#if any(positions > myMD.boxSize)
#    exit("lost particles")
#end
for n=1:size(positions,1)
forces[n,:] .= forceOnSingleParticle(positions,positions[n,:],myMD.boxSize)
end
positions .= 2 .* positionsP .- positionsPP .+ forces .* dt^2
velocities .= (positions .- positionsPP)/(2 .* dt)
positionsPP .= copy(positionsP)
positionsP .= copy(positions)
for n=1:size(positions,1)
if positions[n,1] > myMD.boxSize
positions[n,1] .-= myMD.boxSize
end
if positions[n,1] < 0
positions[n,1] .+= myMD.boxSize
end
if positions[n,2] > myMD.boxSize
positions[n,2] .-= myMD.boxSize
end
if positions[n,2] < 0
positions[n,2] .+= myMD.boxSize
end
end
end
T -=0.3
end

positions
end
using Plots
plotly()
dt = .0001
tMax = .0003
lvecs = 1.07 * [[1.  0.]; [0.5  sqrt(3)/2]]
myMD = initializeParticles(16,4.,lvecs)
@time mypositions = simulate(myMD,dt,tMax)
plot(mypositions[:,1],mypositions[:,2],seriestype = :scatter,aspect_ratio=:equal,xlim = (0,4),ylim = (0,4))

positions = [[4 4]; [3.5 1.5]; [0.5 2]]
@time f = forceOnSingleParticle(positions,positions[3,:],5.)
println(f)
``````
2 Likes

Lots of code to look at here, but this jumps out as an easy thing to improve. This is a tricky case because it’s natural to store `n_particles x 2` positions in Numpy where arrays are row-major. That means that in numpy, the x and y coordinates of each particle will be adjacent in memory, which is generally good (since you’ll probably access both at the same time).

Julia, on the other hand (like Fortran and Matlab) stores arrays column-major. That means that the x and y position of a given particle in your array will be extremely far apart in memory. That’s really bad for performance because it screws up the CPU’s ability to efficiently move memory into cache (it has bad “spatial locality”).

So in Julia, you almost always want to do:

``````positions = zeros(2, nParticles)
``````

But now that you’ve done that, you might as well take it one step further:

``````positions = zeros(SVector{2, Float64}, nParticles)
``````

Storing each point as an `SVector` has exactly zero cost compared to using a plain matrix, but it suddenly makes the rest of your code far easier to work with, and can unlock lots of efficiency gains.

For example, you are getting the x and y positions of a single particle by doing:

``````positions[i,:]
``````

This is a pretty inefficient thing to do over and over for several reasons:

1. Because the matrix is column-major, `positions[i, :]` is not a contiguous block of memory in Julia (although it would be in numpy). Using `zeros(2, nPositions)` and accessing it as `positions[:, i]` would fix that.
2. Any time you slice a matrix in Julia, it makes a copy. That’s completely unnecessary here, and it’s just a waste of memory. Using `@view positions[:, i]` avoids the unnecessary copy.
3. Even a view is unnecessary here if you use the vector-of-SVector approach I suggested above. If you have:
``````positions = zeros(SVector{2, Float64}, nParticles)
``````

then the xy position of a single particle is simply `positions[i]`. No copy, view, or slice required, and no cost at all!

This line also stands out. Doing `deepcopy` in an inner loop is likely to be extremely expensive and I’ve never ever found it to be necessary. Why are you doing a deepcopy here? Why do you need any copy at all?

16 Likes

You also have a bunch of these small arrays in your inner loop. Each new array requires some memory allocation and is relatively expensive to create (at least if you do it a million times…). On the other hand, if you change this to:

``````SVector(sign(diffVec), 0)
``````

then you can avoid that cost entirely. This will also work perfectly with the vector-of-SVector approach for `positions`.

Great Information. Thank You.

The python code needed a copy here otherwise the modifications to one array would modify the other. That caused big problems, because I don’t want `positions` to get modified inside this function ever.

Looking at it closer now, I see what you mean: I may not need a copy there at all. I think originally I was probably being overly cautious.

2 Likes

`positions[i,:]` already creates a copy so throwing a `deepcopy` on top of it is redundant (the original copy itself is also redundant).

One part of @rdeits great answer worth emphasising is:

After you have done that, so many things come naturally because you now have the correct abstraction. Now, there is no risk of getting column- vs row-major wrong because you only have a list of particles. Now, there is no need for a bunch of dotted operations like `.=`, `.*`, `.+` etc, you can now just use the undotted ones. Now, the compiler know the dimension of every particle and can generate highly optimized code instead of generating code for arbitrarily sized vectors. The advantages just go on and on from this superficially small change.

16 Likes

Is there a significant difference in terms of memory and speed between using the SVector{2,float64} approach or making a struct for XY coordinate pairs and then alocating an array of structs?

Is the array of SVectors actually pointers or are they stored contiguously in memory?

Both are the same, and are contiguous in memory.

3 Likes

You might be interested in this thread: Nbabel nbody integrator speed up

There you will find a very good Python implementation, and good Julia implementations of a particle simulation.

1 Like

Thanks. I teach a computational physics class and my students are coding this in python (have for years) and I’m wanting to show them how cool Julia is. I incorporated static arrays and now the code is so much faster than python that my point will be sufficiently potent. When I use MD to do research, I use LAMMPS :).

1 Like

For how short it is, this example is nice to show as well:

I have also a simulation package written for a discipline, which may be useful: GitHub - m3g/CKP

Unfortunately the actual text of the material is in Portuguese

I’d use the velocity Verlet integrator to avoid copies of positions:

``````function simulate!(myMD::MD, dt::Float64, tMax::Float64)
forces = zeros(size(myMD.positions,1),2)
T = 0.4
pos = myMD.positions
vel = myMD.velocities
for n=1:size(positions,1)
forces[n,:] .= forceOnSingleParticle(positions,positions[n,:],myMD.boxSize)
end
while T > 0.2
for time=0:dt:tMax
vel .+= forces .* (dt / 2)
pos .+= vel .* dt
for n=1:size(positions,1)
forces[n,:] .= forceOnSingleParticle(positions, positions[n,:], myMD.boxSize)
end
vel .+= forces .* (dt / 2)
pos .= mod.(pos, myMD.boxSize)
end
T -=0.3
end

pos
end
``````

As already mentioned, using `StaticVector`s for positions, velocities and forces must give a significant speedup. The speedup of the force calculation will be especially noticeable, as you get rid of a lot of small allocations. Another option is to make the function `compute_forces!` which takes an array of forces and mutates it in-place.

I don’t know if that’s a cool way to show off Julia features or not, but you can also put the function for force computation inside the `MD` struct, so that one can easily customize the type of interaction.

1 Like