Hi, I’m looking for a way to compute an array representing the radius of the polar or spherical coordinates given the array sizes and a center point. I have this code for now but it’s incredibly slow and I’m looking for ways to improve it and tips to understand why it is slow:

using BenchmarkTools
function euclideandst(sz, p)
# sz = size of the Array
# p coordinates of the center point
N = length(sz) # number of dimension
return [sqrt(sum(x -> x .* x, [i[j] - c[j] for j = 1:N])) for i = CartesianIndices(sz)]
end
sz = (1024, 1024, 25)
c = (sz .+ 1) ./ 2
@btime dst = euclideandst($sz, $c)
39.293 s (314624005 allocations: 8.40 GiB)

Also if you use several threads, using unsafe views (UnsafeArrays.jl) can make quite a bit of difference as well (even if you use the Distances.jl package).

Something like this should work; taken from ParalllelKMeans.

"""
distance(metric, X1, X2, i1, i2)
Allocationless calculation of distance between vectors X1[:, i1] and X2[:, i2] defined by the supplied distance metric.
"""
@inline distance(metric, X1, X2, i1, i2) = evaluate(metric, uview(X1, :, i1), uview(X2, :, i2))

First, realize that in your innermost loop you are allocating a little 3-component array on every iteration (every i) just to sum it, which is incredibly wasteful and inefficient. Second, you have a bug: although you are passing c as the parameter p in your function, your function actually refers to the global variable c rather than to the parameter p, and global variables are slow in Julia.

Simply fixing the bug (changing c to p in the function) speeds it up by a factor of 20 for me. Getting rid of the inner-loop allocations by changing the sum call to sum(j -> (i[j] - p[j])^2, 1:N)) speeds it up by another factor of 6:

function euclideandst2(sz, p)
# sz = size of the Array
# p coordinates of the center point
N = length(sz) # number of dimension
return [sqrt(sum(j -> (i[j] - p[j])^2, 1:N)) for i = CartesianIndices(sz)]
end

I get another factor of 2 (down to 129ms) by exploiting the fact that the length N of the tuples is known at compile-time, so the loop can be unrolled by the compiler if we use some tricks:

function euclideandst3(sz::NTuple{N}, p::NTuple{N}) where {N}
# sz = size of the Array
# p coordinates of the center point
return [sqrt(sum(ntuple(j -> (i[j] - p[j])^2, Val{N}()))) for i = CartesianIndices(sz)]
end

Automating this sort of trick, using a compile-time array length, is one of the reasons why the StaticArrays package is the method of choice for computations with small coordinate vectors.

Of course, you can get further speedups by multiple threads, etcetera, and if there is an optimized package that does what you need then by all means use it. But hopefully the euclideandst2 function convinces you that you can obtain quite decent in performance in Julia just by following a few simple guidelines without advanced wizardry.

Thanks a lot for your time in providing me such a detailed answer ! I didn’t know about the Distances.jl package and as I’m in the learning phase of the julia language I think it’s good practice to try to implement such basic function. Thanks to you I also understand a bit more how julia works and how to get the maximum of it !