I have a system with Ntot particles. I am creating a matrix Mh that is 6Ntot x 6Ntot and symmetric, and elements are given by pairwise calculations between particles.

I’ve tried my best to speed up my Julia code, but feel like I’m missing something. For Ntot = 500, Matlab mex time = 0.087 s and Julia = 0.18 s. The function is being called every timestep of a simulation so the difference is adding up.

The function is essentially: loop over all particles pairwise, calculate separation rij, calculate submatrices Mtt, Mtr, Mrt, Mrr and place into matrix Mh. I’ve put some code below. X3 is a 3Ntot length vector (particle positions), ‘a’ is an Ntot length vector (particle radii). Any help greatly appreciated.

```
using LinearAlgebra, BenchmarkTools
function skew!(s,a)
s[1,2] = -a[3]
s[1,3] = a[2]
s[2,1] = a[3]
s[2,3] = -a[1]
s[3,1] = -a[2]
s[3,2] = a[1]
end
function calc_Mh!(Mh,X3,Ntot,a)
Mtt = zeros(3,3)
Mtr = zeros(3,3)
Mrt = zeros(3,3)
Mrr = zeros(3,3)
rhat_skew = zeros(3,3)
rhat = zeros(3)
rij = zeros(3)
Ii = Matrix(I,3,3)
rhat_op = zeros(3,3)
for i = 1:Ntot
xi = @views X3[3(i-1)+1:3*i];
for j = i+1:Ntot
xj = @views X3[3*(j-1)+1:3*j];
rij .= xi.-xj
r = norm(rij)
rhat .= rij./r
skew!(rhat_skew,-rhat)
sqrd = (a[i]^2 + a[j]^2)/r^2
rhat_op .= rhat*rhat'
Mtt .= 1/(2*r) * ((1 + sqrd/3)*I + (1-sqrd)*rhat_op)
Mtr .= 1/(2*r^2)*rhat_skew
Mrt .= Mtr
Mrr .= 1/(4*r^3) * (3*rhat_op - I)
Mh[3*(i-1)+1:3*i, 3*(j-1)+1:3*j] .= Mtt
Mh[3*(i-1)+1:3*i, 3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j] .= Mtr
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(j-1)+1:3*j] .= Mrt
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j] .= Mrr
Mh[3*(j-1)+1:3*j, 3*(i-1)+1:3*i] .= Mtt'
Mh[3*(j-1)+1:3*j, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] .= Mtr'
Mh[3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j, 3*(i-1)+1:3*i] .= Mrt'
Mh[3*(Ntot)+3*(j-1)+1:3*(Ntot)+3*j, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] .= Mrr'
end
Mtt .= 2/(3*a[i])*Ii
Mrr .= 1/(2*a[i]^3)*Ii
Mh[3*(i-1)+1:3*i, 3*(i-1)+1:3*i] .= Mtt
Mh[3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i, 3*(Ntot)+3*(i-1)+1:3*(Ntot)+3*i] .= Mrr
end
Mh .= Mh * 1/(4*pi);
return Mh
end
Ntot = 500
a = 0.05*ones(Ntot)
X3 = 10*rand(3*Ntot)
Mh = zeros(Ntot*6,Ntot*6)
@btime calc_Mh!(Mh,X3,Ntot,a)
```