# Struggling to get Julia as fast as Matlab mex. (Filling large matrix, doing pairwise calculations over particles)

I have a system with Ntot particles. I am creating a matrix Mh that is 6Ntot x 6Ntot and symmetric, and elements are given by pairwise calculations between particles.

I’ve tried my best to speed up my Julia code, but feel like I’m missing something. For Ntot = 500, Matlab mex time = 0.087 s and Julia = 0.18 s. The function is being called every timestep of a simulation so the difference is adding up.

The function is essentially: loop over all particles pairwise, calculate separation rij, calculate submatrices Mtt, Mtr, Mrt, Mrr and place into matrix Mh. I’ve put some code below. X3 is a 3Ntot length vector (particle positions), ‘a’ is an Ntot length vector (particle radii). Any help greatly appreciated.

``````using LinearAlgebra, BenchmarkTools

function skew!(s,a)
s[1,2] = -a[3]
s[1,3] = a[2]
s[2,1] = a[3]
s[2,3] = -a[1]
s[3,1] = -a[2]
s[3,2] = a[1]
end

function calc_Mh!(Mh,X3,Ntot,a)
Mtt = zeros(3,3)
Mtr = zeros(3,3)
Mrt = zeros(3,3)
Mrr = zeros(3,3)
rhat_skew = zeros(3,3)
rhat = zeros(3)
rij = zeros(3)
Ii = Matrix(I,3,3)
rhat_op = zeros(3,3)
for i = 1:Ntot
xi = @views X3[3(i-1)+1:3*i];
for j = i+1:Ntot
xj = @views X3[3*(j-1)+1:3*j];

rij .= xi.-xj
r = norm(rij)
rhat .= rij./r
skew!(rhat_skew,-rhat)

sqrd = (a[i]^2 + a[j]^2)/r^2
rhat_op .= rhat*rhat'

Mtt  .= 1/(2*r) * ((1 + sqrd/3)*I + (1-sqrd)*rhat_op)
Mtr  .= 1/(2*r^2)*rhat_skew
Mrt  .= Mtr
Mrr  .= 1/(4*r^3) * (3*rhat_op - I)

Mh[3*(i-1)+1:3*i, 3*(j-1)+1:3*j] .= Mtt
Mh[3*(i-1)+1:3*i, 3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j] .= Mtr
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(j-1)+1:3*j] .= Mrt
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j] .= Mrr

Mh[3*(j-1)+1:3*j, 3*(i-1)+1:3*i] .= Mtt'
Mh[3*(j-1)+1:3*j, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] .= Mtr'
Mh[3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j, 3*(i-1)+1:3*i] .= Mrt'
Mh[3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] .= Mrr'
end

Mtt .= 2/(3*a[i])*Ii
Mrr .= 1/(2*a[i]^3)*Ii
Mh[3*(i-1)+1:3*i, 3*(i-1)+1:3*i] .= Mtt
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] .= Mrr
end
Mh .= Mh * 1/(4*pi);
return Mh
end

Ntot = 500
a = 0.05*ones(Ntot)
X3 = 10*rand(3*Ntot)
Mh  = zeros(Ntot*6,Ntot*6)
@btime calc_Mh!(Mh,X3,Ntot,a)
``````
1 Like

Before reinventing the wheel, check the Distances.jl package to see if all you need is `pairwise` distances. They have efficient implementations that are widely tested for different distance types.

3 Likes

I think this is allocating an intermediate array at every loop iteration.

It is possible that you can have a faster and cleaner code using StaticArrays for all these small matrices and vectors that you need.

2 Likes
``````julia> @btime calc_Mh!(Mh,X3,Ntot,a)`
99.728 ms (1123763 allocations: 268.80 MiB)
``````

You are allocating way too much for this to be really fast. The preallocated variables are a great start but e.g.

``````Mtt .= 2/(3*a[i]) * Ii
``````

should be

``````Mtt .= 2/(3*a[i]) .* Ii
``````

to get fully fused and avoid intermediate storage.

If you already have a mex implementation as C code it’s probably very straightforward to port that code to Julia and I expect you will see a similar performance.

2 Likes

Thanks for pointing out this package, this looks useful! Although I’m not sure if I can take advantage too much because I also need to calculate other pairwise quantities that depend on the separation vector rather than just Euclidean distance, so I’m not sure if the methods they use in that package can be leveraged fully. Although I may be able to pre-calculate as many quantities as possible using that package before entering my for loops. Cheers!

``````using StaticArrays

skew(a) = @SMatrix [0 -a[3] a[2]; a[3] 0 -a[1]; -a[2] a[1] 0]

function calc_Mh_StaticArrays!(Mh,X3,Ntot,a)
I3 = Diagonal(SVector(true,true,true))
for i = 1:Ntot
xi = @views SVector{3}(X3[3(i-1)+1:3*i]);
for j = i+1:Ntot
xj = @views SVector{3}(X3[3*(j-1)+1:3*j]);

rij = xi.-xj
r = norm(rij)
rhat = rij/r
rhat_skew = skew(-rhat)

sqrd = (a[i]^2 + a[j]^2)/r^2
rhat_op = rhat*rhat'

Mtt  = 1/(2*r) * ((1 + sqrd/3)*I + (1-sqrd)*rhat_op)
Mtr  = 1/(2*r^2)*rhat_skew
Mrt  = Mtr
Mrr  = 1/(4*r^3) * (3*rhat_op - I)

Mh[3*(i-1)+1:3*i, 3*(j-1)+1:3*j] = Mtt
Mh[3*(i-1)+1:3*i, 3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j] = Mtr
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(j-1)+1:3*j] = Mrt
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j] = Mrr

Mh[3*(j-1)+1:3*j, 3*(i-1)+1:3*i] = Mtt'
Mh[3*(j-1)+1:3*j, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] = Mtr'
Mh[3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j, 3*(i-1)+1:3*i] = Mrt'
Mh[3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] = Mrr'
end

Mtt = 2/(3*a[i])*I3
Mrr = 1/(2*a[i]^3)*I3
Mh[3*(i-1)+1:3*i, 3*(i-1)+1:3*i] = Mtt
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] = Mrr
end
Mh .= Mh .* (1/(4*pi));
return Mh
end
``````
``````julia> @btime calc_Mh!(\$Mh,\$X3,\$Ntot,\$a);
221.261 ms (1123763 allocations: 268.80 MiB)

julia> @btime calc_Mh_StaticArrays!(\$Mh,\$X3,\$Ntot,\$a);
28.496 ms (0 allocations: 0 bytes)
``````

8x speedup with a pretty straightforward conversion to `StaticArrays.jl`. Some of that could have been saved by being better about broadcasting in your initial implementation, but for these 3x3 matrices you won’t beat StaticArrays.

4 Likes

Can you disregard distances greater than cutoff? If so check: GitHub - m3g/CellListMap.jl: Flexible implementation of cell lists to map the calculations of particle-pair dependent functions, such as forces, energies, neighbour lists, etc.

1 Like

@GunnarFarneback
Thanks, I didn’t realise I was missing the .* for Float * Matrix operations. Went through and added that throughout and helped a lot, cheers

@mikmoore
Thanks a lot, I hadn’t seen static arrays before but this looks great, I’ll read up on this and start implementing

@Imiq
Sadly I can’t disregard distances greater than a cutoff, although it’s good to be aware of that package for future use!

1 Like