# For loops acceleration

I have a function that involves several for loops that I would like to accelerate a little bit, but not sure if multi-threading is useful in my case.

``````function SynDiv(option_leverage = 5, Rf = 1, gamma = 0.1)

W0 = 100
price0 = 5:5:25
R_mean = [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
CovMat = [0.0089436489 -0.0017853213  0.0059088180  5.054594e-03  2.074978e-04;
-0.0017853213  0.0013810899 -0.0015338188 -7.490523e-04  1.392822e-04;
0.0059088180 -0.0015338188  0.0060244430  4.953837e-03  2.630359e-04;
0.0050545936 -0.0007490523  0.0049538372  5.525412e-03 -1.619497e-05;
0.0002074978  0.0001392822  0.0002630359 -1.619497e-05  4.827990e-04]
share1_range = 0:(W0/price0)
share2_range = 0:(W0/price0)
share3_range = 0:(W0/price0)
share4_range = 0:(W0/price0)
share5_range = 0:(W0/price0)

function utility(share)

gain_risk = 0
invest_risk = 0

for i in 1:5

gain_risk = gain_risk + share[i]*price0[i]*R_mean[i]
invest_risk = invest_risk + share[i]*price0[i]

end

gain = gain_risk + (W0 - invest_risk)*Rf

w = share .* price0
penalty = 0.5 * gamma * w' * CovMat * w

return gain - penalty

end

U = zeros(length(share1_range), length(share2_range), length(share3_range), length(share4_range), length(share5_range))

for a in 1:length(share1_range)

for b in 1:length(share2_range)

for c in 1:length(share3_range)

for d in 1:length(share4_range)

for e in 1:length(share5_range)

U[a, b, c, d, e] = utility([share1_range[a],
share2_range[b],
share3_range[c],
share4_range[d],
share5_range[e]])
end

end

end

end

end

price0_option = price0/option_leverage
share1_range_option = 0:(W0/price0_option)
share2_range_option = 0:(W0/price0_option)
share3_range_option = 0:(W0/price0_option)
share4_range_option = 0:(W0/price0_option)
share5_range_option = 0:(W0/price0_option)

function utility_option(share)

gain_risk = 0
invest_risk = 0

for i in 1:5

gain_risk = gain_risk + share[i]*price0_option[i]*R_mean[i]
invest_risk = invest_risk + share[i]*price0_option[i]

end

gain = gain_risk + (W0 - invest_risk)*Rf

w = share .* price0_option
penalty = 0.5 * gamma * w' * CovMat * w

return gain - penalty

end

U_option = zeros(length(share1_range_option), length(share2_range_option), length(share3_range_option), length(share4_range_option), length(share5_range_option))

for a in 1:length(share1_range_option)

for b in 1:length(share2_range_option)

for c in 1:length(share3_range_option)

for d in 1:length(share4_range_option)

for e in 1:length(share5_range_option)

U_option[a, b, c, d, e] = utility_option([share1_range_option[a],
share2_range_option[b],
share3_range_option[c],
share4_range_option[d],
share5_range_option[e]])
end

end

end

end

end

return maximum(U_option) - maximum(U)

end
``````

You’re allocating a lot of small vectors of length 5. Making tuples instead should decrease memory use.

Switching the order of the loops should help. That is, I’d use

``````for e in...
for d in...
for c in...
``````

Or, better yet, instead keep the loops the same, but permute the dimensions of U, so that it is `U[e,d,c,b,a]`. I say this is better, because it looks like `share1_range` is the longest iterator.

Then, try an `@threads` on the outer loop (`for a in 1:length(share1_range)`).

4 Likes

I add @inbounds to all for loops and the function seems faster. Could you be more specific in how to use tuples?

you are creating temporary vectors here:

``````U[a, b, c, d, e] = utility([share1_range[a],
share2_range[b],
share3_range[c],
share4_range[d],
share5_range[e]])
``````

``````U[a, b, c, d, e] = utility((share1_range[a],
share2_range[b],
share3_range[c],
share4_range[d],
share5_range[e]))
``````

i.e. replace [ … ] with ( … )

I did that and saw a 25% speedup, any other places I could accelerate it?

Some observations:

• a lot of variables are small vectors or matrices, so using StaticArrays will probably give you a big win.
• after allocating and filling `U` and `U_option`, all that’s done with these large arrays is taking the maximum. By taking a running maximum you can avoid those allocations and any indexing worries.
• there’s a lot of duplicated code / cleanup opportunities.

I couldn’t resist, so here’s my version:

``````using StaticArrays

function SynDiv2(option_leverage = 5, Rf = 1, gamma = 0.1)
W0 = 100
price0 = SVector{5, Float64}(5 : 5 : 25)
price0_option = price0 / option_leverage
R_mean = @SVector [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
CovMat = @SMatrix [0.0089436489 -0.0017853213  0.0059088180  5.054594e-03  2.074978e-04;
-0.0017853213  0.0013810899 -0.0015338188 -7.490523e-04  1.392822e-04;
0.0059088180 -0.0015338188  0.0060244430  4.953837e-03  2.630359e-04;
0.0050545936 -0.0007490523  0.0049538372  5.525412e-03 -1.619497e-05;
0.0002074978  0.0001392822  0.0002630359 -1.619497e-05  4.827990e-04]
share_range = ntuple(i -> 0.0 : W0 / price0[i], Val(5))
share_range_option = ntuple(i -> 0.0 : W0 / price0_option[i], Val(5))

# Reduce code duplication by adding `price0` argument:
function utility(share::SVector{5}, price0::SVector{5})
gain_risk = 0.0 # 0.0 instead of 0 to avoid type instability
invest_risk = 0.0
@inbounds for i in 1 : 5
gain_risk += share[i] * price0[i] * R_mean[i]
invest_risk += share[i] * price0[i]
end
gain = gain_risk + (W0 - invest_risk) * Rf
w = share .* price0
penalty = 0.5 * gamma * (w' * CovMat * w)
return gain - penalty
end

# U only seems to be used in maximum(U), so just skip creating U:
# U = Array{Float64}(undef, ntuple(i -> length(share_range[i]), 5))
Umax = -Inf
@inbounds for share in Iterators.product(share_range...)
Umax = max(Umax, utility(SVector(share), price0))
end

Umax_option = -Inf
@inbounds for share in Iterators.product(share_range_option...)
Umax_option = max(Umax_option, utility(SVector(share), price0_option))
end

return Umax_option - Umax
end
``````

2.063 s (0 allocations: 0 bytes) (versus 33.643 s (382686706 allocations: 46.33 GiB)) on my machine. Now that allocations are down to zero, you could start thinking about multi-threading if you need more speed.

10 Likes

And a threaded version for good measure:

``````using StaticArrays

function max_utility(utility::F, share_range, price0) where F
@inbounds for tail in Iterators.product(Base.tail(share_range)...)
end
end
return maximum(Umaxes)
end

function SynDiv2(option_leverage = 5, Rf = 1, gamma = 0.1)
W0 = 100
price0 = SVector{5, Float64}(5 : 5 : 25)
price0_option = price0 / option_leverage
R_mean = @SVector [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
CovMat = @SMatrix [0.0089436489 -0.0017853213  0.0059088180  5.054594e-03  2.074978e-04;
-0.0017853213  0.0013810899 -0.0015338188 -7.490523e-04  1.392822e-04;
0.0059088180 -0.0015338188  0.0060244430  4.953837e-03  2.630359e-04;
0.0050545936 -0.0007490523  0.0049538372  5.525412e-03 -1.619497e-05;
0.0002074978  0.0001392822  0.0002630359 -1.619497e-05  4.827990e-04]
share_range = ntuple(i -> 0.0 : W0 / price0[i], Val(5))
share_range_option = ntuple(i -> 0.0 : W0 / price0_option[i], Val(5))

@inline @inbounds function utility(share::SVector{5}, price0::SVector{5})
w = share .* price0
invest_risk = sum(w)
gain_risk = sum(w .* R_mean)
gain = gain_risk + (W0 - invest_risk) * Rf
penalty = 0.5 * gamma * (w' * CovMat * w)
return gain - penalty
end

Umax = max_utility(utility, share_range, price0)
Umax_option = max_utility(utility, share_range_option, price0_option)

return Umax_option - Umax
end
``````

477.380 ms (4 allocations: 1.47 KiB) on a 6-core machine.

Edit: fixed a bug with uninitialized data.

10 Likes

I note that you actually only need `share .* price0` in the `utility` computation, and `price0` and `price0_option` are fixed over the main loop. So you can rescale `price0` and `price0_option` to unity and instead rescale `share_range` and `share_range_option`. This removes a lot of products.