I’m in the process of converting some old C code to Julia but am having issues getting the performance to be even close. The specific function that takes most of the time, in the Julia code atleast, is the following

```
@everywhere const np = nprocs()
@everywhere using DistributedArrays
@everywhere using DistributedArrays.SPMD
@everywhere const nkmax = 21
@everywhere const N = 150
@everywhere function slow_computation(b,dk,dr)
pids = procs(dr)[:,1]
pstart = minimum(pids)
for j1 = 1:nkmax, j2 = 1:j1, j3 = (j1-j2):j2
if j3 > 0
@inbounds dr[:L] = dk[:L][:,:,:,j1] .*
dk[:L][:,:,:,j2] .*
dk[:L][:,:,:,j3]
barrier(;pids=pids)
if myid() == pstart
b[j1,j2,j3] = sum(dr)/N^3
end
barrier(;pids=pids)
end
end
end
function overall_computation()
b = Array{Float64,3}(undef,nkmax,nkmax,nkmax)
dk = dzeros((N,N,N,nkmax), pids, [np,1,1,1])
dr = dzeros((N,N,N), pids, [np,1,1])
# Do some stuff to compute values of dk and dr
spmd(slow_computation, b, dk, dr; pids=pids)
end
```

When I profile the code it seems to spend the overwhelming majority of its time in either get or set index.

Nothing I’ve tried has been faster than this version but below is a graph of N vs time for the whole computation, which is dominated by this function.

I think that this should be a function that Julia should be comparable to C in so I’m seeking help from people more knowledgeable about Julia to hopefully find what poor performance choice I made.