Learning to optimize

Still trying to build intuition for optimizing Julia code. Here is an MD code I translated from python. It already runs a lot faster than my python version but I feel like it could be faster. I’m more interested in learning optimization techniques than this specific piece of code but I thought this would be a good piece to practice on. When I time the function that does the heavy lifting I’m seeing over 10k allocations. (I’ve set the loop to run just a few iterations for now.) Any ideas for using better style/ optimizing this would be much appreciated. Thanks in advance:

struct MD
    nParticles::Int64
    boxSize::Float64
    positions:: Array{Float64,2}
    velocities:: Array{Float64,2}

end
using Random
using LinearAlgebra
#using Pkg
#Pkg.add("StaticArrays")
using StaticArrays

function initializeParticles(nParticles::Int64,boxSize::Float64,lVecs::Array{Float64,2},random::Bool = false):: MD
    positions = zeros(nParticles,2)
    velocities = zeros(nParticles,2)
    deltaR = 0.5
    v0=1.

    idx = 0
    for i =-boxSize-10:boxSize+10
        for j =-boxSize-10:boxSize+10
            println(i,j)
            println(idx)
            vec = i * lVecs[1,:] + j * lVecs[2,:] + [0.6, 0.6]
            if random
                rng = MersenneTwister(1234)
                rVec = rand(rng,2)
                vec +=  deltaR * rVec/norm(rVec)
            end
            if 0 <= vec[1] <= boxSize && 0 <= vec[2] <= boxSize && idx < nParticles
                println("adding")
                println(i,j)
                println(vec)
                println(idx)
                idx += 1
                positions[idx,:] = vec

                rng = MersenneTwister(1234)
                vVec = rand(rng,2)
                velocities[idx,:] = v0 * vVec/norm(vVec)
                
            end
        end
    end
    if idx < nParticles
        println("Didn't find all of the requested particles")
        println("Requested: ", nParticles)
        println("Found: ", idx)
    end
    MD(nParticles,boxSize,positions,velocities)

end

function forceOnSingleParticle(positions::Array{Float64,2},particle::Array{Float64,1},boxSize::Float64):: Array{Float64,1}
    fVec = zeros(2)
    modifiedPos = zeros(2)
    diffVec = zeros(Float64,2)
    for i=1:size(positions,1)
        diffVec .= particle - positions[i,:]

        if abs(diffVec[1]) > boxSize/2 && abs(diffVec[2]) > boxSize/2
            modifiedPos .= positions[i,:] + boxSize * [sign(diffVec[1]) , sign(diffVec[2])]
        elseif abs(diffVec[1]) > boxSize/2
            modifiedPos .= positions[i,:] + boxSize * [sign(diffVec[1]) , 0]
        elseif abs(diffVec[2]) > boxSize/2
            modifiedPos .= positions[i,:] + boxSize * [0 , sign(diffVec[2])]
        else
            modifiedPos .= deepcopy(positions[i,:])
        end
        diffVec .= particle - modifiedPos
        distance = norm(diffVec)
        
        if distance > 0.5
            fVec .+= 24 * (2/distance^13 - 1/distance^7) * 1/distance * [diffVec[1], diffVec[2]]
        end
    end
    return fVec

end


function simulate(myMD::MD,dt::Float64,tMax::Float64)
    forces = zeros(size(myMD.positions,1),2)
    T = 0.4
    positions = copy(myMD.positions)
    positionsP = copy(myMD.positions)
    positionsPP = positions - myMD.velocities * dt
    velocities = zeros(Float64,size(positions,1),size(positions,2))
    while T > 0.2
        for time=0:dt:tMax
            #println(time)
            #if any(positions > myMD.boxSize)
            #    exit("lost particles")
            #end
            for n=1:size(positions,1)
                forces[n,:] .= forceOnSingleParticle(positions,positions[n,:],myMD.boxSize)
            end
            positions .= 2 .* positionsP .- positionsPP .+ forces .* dt^2
            velocities .= (positions .- positionsPP)/(2 .* dt)
            positionsPP .= copy(positionsP)
            positionsP .= copy(positions)
            for n=1:size(positions,1)
                if positions[n,1] > myMD.boxSize
                    positions[n,1] .-= myMD.boxSize
                end
                if positions[n,1] < 0
                    positions[n,1] .+= myMD.boxSize
                end
                if positions[n,2] > myMD.boxSize
                    positions[n,2] .-= myMD.boxSize
                end
                if positions[n,2] < 0
                    positions[n,2] .+= myMD.boxSize
                end
            end
        end 
        T -=0.3
    end

    positions
end
using Plots
plotly()
dt = .0001
tMax = .0003
lvecs = 1.07 * [[1.  0.]; [0.5  sqrt(3)/2]]
myMD = initializeParticles(16,4.,lvecs)
@time mypositions = simulate(myMD,dt,tMax)
plot(mypositions[:,1],mypositions[:,2],seriestype = :scatter,aspect_ratio=:equal,xlim = (0,4),ylim = (0,4))

positions = [[4 4]; [3.5 1.5]; [0.5 2]]
@time f = forceOnSingleParticle(positions,positions[3,:],5.)
println(f)
2 Likes

Lots of code to look at here, but this jumps out as an easy thing to improve. This is a tricky case because it’s natural to store n_particles x 2 positions in Numpy where arrays are row-major. That means that in numpy, the x and y coordinates of each particle will be adjacent in memory, which is generally good (since you’ll probably access both at the same time).

Julia, on the other hand (like Fortran and Matlab) stores arrays column-major. That means that the x and y position of a given particle in your array will be extremely far apart in memory. That’s really bad for performance because it screws up the CPU’s ability to efficiently move memory into cache (it has bad “spatial locality”).

So in Julia, you almost always want to do:

positions = zeros(2, nParticles)

But now that you’ve done that, you might as well take it one step further:

positions = zeros(SVector{2, Float64}, nParticles)

Storing each point as an SVector has exactly zero cost compared to using a plain matrix, but it suddenly makes the rest of your code far easier to work with, and can unlock lots of efficiency gains.

For example, you are getting the x and y positions of a single particle by doing:

positions[i,:]

This is a pretty inefficient thing to do over and over for several reasons:

  1. Because the matrix is column-major, positions[i, :] is not a contiguous block of memory in Julia (although it would be in numpy). Using zeros(2, nPositions) and accessing it as positions[:, i] would fix that.
  2. Any time you slice a matrix in Julia, it makes a copy. That’s completely unnecessary here, and it’s just a waste of memory. Using @view positions[:, i] avoids the unnecessary copy.
  3. Even a view is unnecessary here if you use the vector-of-SVector approach I suggested above. If you have:
positions = zeros(SVector{2, Float64}, nParticles)

then the xy position of a single particle is simply positions[i]. No copy, view, or slice required, and no cost at all!

This line also stands out. Doing deepcopy in an inner loop is likely to be extremely expensive and I’ve never ever found it to be necessary. Why are you doing a deepcopy here? Why do you need any copy at all?

16 Likes

You also have a bunch of these small arrays in your inner loop. Each new array requires some memory allocation and is relatively expensive to create (at least if you do it a million times…). On the other hand, if you change this to:

SVector(sign(diffVec[1]), 0)

then you can avoid that cost entirely. This will also work perfectly with the vector-of-SVector approach for positions.

Great Information. Thank You.

The python code needed a copy here otherwise the modifications to one array would modify the other. That caused big problems, because I don’t want positions to get modified inside this function ever.

Looking at it closer now, I see what you mean: I may not need a copy there at all. I think originally I was probably being overly cautious.

2 Likes

positions[i,:] already creates a copy so throwing a deepcopy on top of it is redundant (the original copy itself is also redundant).

One part of @rdeits great answer worth emphasising is:

After you have done that, so many things come naturally because you now have the correct abstraction. Now, there is no risk of getting column- vs row-major wrong because you only have a list of particles. Now, there is no need for a bunch of dotted operations like .=, .*, .+ etc, you can now just use the undotted ones. Now, the compiler know the dimension of every particle and can generate highly optimized code instead of generating code for arbitrarily sized vectors. The advantages just go on and on from this superficially small change.

16 Likes

Is there a significant difference in terms of memory and speed between using the SVector{2,float64} approach or making a struct for XY coordinate pairs and then alocating an array of structs?

Is the array of SVectors actually pointers or are they stored contiguously in memory?

Both are the same, and are contiguous in memory.

3 Likes

You might be interested in this thread: Nbabel nbody integrator speed up

There you will find a very good Python implementation, and good Julia implementations of a particle simulation.

1 Like

Thanks. I teach a computational physics class and my students are coding this in python (have for years) and I’m wanting to show them how cool Julia is. I incorporated static arrays and now the code is so much faster than python that my point will be sufficiently potent. When I use MD to do research, I use LAMMPS :).

1 Like

For how short it is, this example is nice to show as well:

I have also a simulation package written for a discipline, which may be useful: GitHub - m3g/CKP

Unfortunately the actual text of the material is in Portuguese

I’d use the velocity Verlet integrator to avoid copies of positions:

function simulate!(myMD::MD, dt::Float64, tMax::Float64)
    forces = zeros(size(myMD.positions,1),2)
    T = 0.4
    pos = myMD.positions
    vel = myMD.velocities
    for n=1:size(positions,1)
        forces[n,:] .= forceOnSingleParticle(positions,positions[n,:],myMD.boxSize)
    end
    while T > 0.2
        for time=0:dt:tMax
            vel .+= forces .* (dt / 2)
            pos .+= vel .* dt
            for n=1:size(positions,1)
                forces[n,:] .= forceOnSingleParticle(positions, positions[n,:], myMD.boxSize)
            end
            vel .+= forces .* (dt / 2)
            pos .= mod.(pos, myMD.boxSize)
        end 
        T -=0.3
    end

    pos
end

As already mentioned, using StaticVectors for positions, velocities and forces must give a significant speedup. The speedup of the force calculation will be especially noticeable, as you get rid of a lot of small allocations. Another option is to make the function compute_forces! which takes an array of forces and mutates it in-place.

I don’t know if that’s a cool way to show off Julia features or not, but you can also put the function for force computation inside the MD struct, so that one can easily customize the type of interaction.

1 Like