# Speedup optimization in nested for loop

I have adapted some python code that takes `data` which is `G x N`, and returns the (strictly upper triangular) matrix `cost_permuted` whose `(i,j)`-th entry represents the minimum of `cooscillation_distance(data[i, :], data[j, :], Ψ)` for `Ψ ∈ (0, 2π)`:

``````function cooscillation_distance(X, Y, Ψ)
sum(@. (X^2 + Y^2 - 2cos(Ψ) * X * Y - sin(Ψ)^2)^2 )
end
``````

As an extension to this, I am employing a bootstrap approach where I randomly permute `Y`, `n` times, and compute the minimum cooscillation distance for each permutation:

``````using Random, Optim, FLoops

function get_permuted_cost(
data,
n_permutations::Int
)
@assert n_permutations > 1 "Number of bootstrap permutations must be greater than 1"
cost_permuted = Array{Float64}(
undef,
first(size(data)),
first(size(data)),
n_permutations
)

@inbounds @views begin
for ix in 1:first(size(data))
@floop for iy in (ix+1):first(size(data))
for n in 1:n_permutations
y_random = shuffle(data[iy, :])
res = optimize(x->cooscillation_distance(data[ix,:],y_random, x), 0.0, 2pi)
cost_permuted[ix, iy, n] = Optim.minimum(res)
end
end
end
end

return cost_permuted
end
``````

This (probably naive) implementation is quite slow, e.g

``````Random.seed!(1234)
G = 500
N = 1000
n = 10

@time get_permuted_cost(rand(G,N),n); # 108.399799 seconds (23.69 M allocations: 160.340 GiB, 3.97% gc time)
``````

and I would like to speed it up so it runs in a reasonable time (i.e. <1min for G = 500, N = 1000, n = 100). Does anyone have any suggestions for how I could do this?

I think you can spare a lot of cos and sin evaluations by using a variable `cospsi` varying between -1 and 1 and replace the `sin(psi)^2` by `1- cospsi^2` and of course `cos(psi)` by `cospsi`.(actually with your formulation I think you should not look in `0 2pi` but `0 pi` (since the sin is squared anyway)

On my computer I gain a factor 4.

``````function cooscillation_distance(X, Y, cosΨ)
sum(@. (X^2 + Y^2 - 2cosΨ * X * Y - 1+cosΨ^2 ))
end
using Random, Optim, FLoops

function get_permuted_cost(
data,
n_permutations::Int
)
@assert n_permutations > 1 "Number of bootstrap permutations must be greater than 1"
cost_permuted = Array{Float64}(
undef,
first(size(data)),
first(size(data)),
n_permutations
)

@inbounds @views begin
for ix in 1:first(size(data))
@floop for iy in (ix+1):first(size(data))
for n in 1:n_permutations
y_random = shuffle(data[iy, :])
res = optimize(x->cooscillation_distance(data[ix,:],y_random, x), -1.0,1.0)
cost_permuted[ix, iy, n] = Optim.minimum(res)
end
end
end
end

return cost_permuted
end
``````