I have a function that involves several for loops that I would like to accelerate a little bit, but not sure if multi-threading is useful in my case.
function SynDiv(option_leverage = 5, Rf = 1, gamma = 0.1)
W0 = 100
price0 = 5:5:25
R_mean = [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
CovMat = [0.0089436489 -0.0017853213 0.0059088180 5.054594e-03 2.074978e-04;
-0.0017853213 0.0013810899 -0.0015338188 -7.490523e-04 1.392822e-04;
0.0059088180 -0.0015338188 0.0060244430 4.953837e-03 2.630359e-04;
0.0050545936 -0.0007490523 0.0049538372 5.525412e-03 -1.619497e-05;
0.0002074978 0.0001392822 0.0002630359 -1.619497e-05 4.827990e-04]
share1_range = 0:(W0/price0[1])
share2_range = 0:(W0/price0[2])
share3_range = 0:(W0/price0[3])
share4_range = 0:(W0/price0[4])
share5_range = 0:(W0/price0[5])
function utility(share)
gain_risk = 0
invest_risk = 0
for i in 1:5
gain_risk = gain_risk + share[i]*price0[i]*R_mean[i]
invest_risk = invest_risk + share[i]*price0[i]
end
gain = gain_risk + (W0 - invest_risk)*Rf
w = share .* price0
penalty = 0.5 * gamma * w' * CovMat * w
return gain - penalty
end
U = zeros(length(share1_range), length(share2_range), length(share3_range), length(share4_range), length(share5_range))
for a in 1:length(share1_range)
for b in 1:length(share2_range)
for c in 1:length(share3_range)
for d in 1:length(share4_range)
for e in 1:length(share5_range)
U[a, b, c, d, e] = utility([share1_range[a],
share2_range[b],
share3_range[c],
share4_range[d],
share5_range[e]])
end
end
end
end
end
price0_option = price0/option_leverage
share1_range_option = 0:(W0/price0_option[1])
share2_range_option = 0:(W0/price0_option[2])
share3_range_option = 0:(W0/price0_option[3])
share4_range_option = 0:(W0/price0_option[4])
share5_range_option = 0:(W0/price0_option[5])
function utility_option(share)
gain_risk = 0
invest_risk = 0
for i in 1:5
gain_risk = gain_risk + share[i]*price0_option[i]*R_mean[i]
invest_risk = invest_risk + share[i]*price0_option[i]
end
gain = gain_risk + (W0 - invest_risk)*Rf
w = share .* price0_option
penalty = 0.5 * gamma * w' * CovMat * w
return gain - penalty
end
U_option = zeros(length(share1_range_option), length(share2_range_option), length(share3_range_option), length(share4_range_option), length(share5_range_option))
for a in 1:length(share1_range_option)
for b in 1:length(share2_range_option)
for c in 1:length(share3_range_option)
for d in 1:length(share4_range_option)
for e in 1:length(share5_range_option)
U_option[a, b, c, d, e] = utility_option([share1_range_option[a],
share2_range_option[b],
share3_range_option[c],
share4_range_option[d],
share5_range_option[e]])
end
end
end
end
end
return maximum(U_option) - maximum(U)
end
You’re allocating a lot of small vectors of length 5. Making tuples instead should decrease memory use.
Switching the order of the loops should help. That is, I’d use
for e in...
for d in...
for c in...
instead.
Or, better yet, instead keep the loops the same, but permute the dimensions of U, so that it is U[e,d,c,b,a]. I say this is better, because it looks like share1_range is the longest iterator.
Then, try an @threads on the outer loop (for a in 1:length(share1_range)).
a lot of variables are small vectors or matrices, so using StaticArrays will probably give you a big win.
after allocating and filling U and U_option, all that’s done with these large arrays is taking the maximum. By taking a running maximum you can avoid those allocations and any indexing worries.
there’s a lot of duplicated code / cleanup opportunities.
I couldn’t resist, so here’s my version:
using StaticArrays
function SynDiv2(option_leverage = 5, Rf = 1, gamma = 0.1)
W0 = 100
price0 = SVector{5, Float64}(5 : 5 : 25)
price0_option = price0 / option_leverage
R_mean = @SVector [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
CovMat = @SMatrix [0.0089436489 -0.0017853213 0.0059088180 5.054594e-03 2.074978e-04;
-0.0017853213 0.0013810899 -0.0015338188 -7.490523e-04 1.392822e-04;
0.0059088180 -0.0015338188 0.0060244430 4.953837e-03 2.630359e-04;
0.0050545936 -0.0007490523 0.0049538372 5.525412e-03 -1.619497e-05;
0.0002074978 0.0001392822 0.0002630359 -1.619497e-05 4.827990e-04]
share_range = ntuple(i -> 0.0 : W0 / price0[i], Val(5))
share_range_option = ntuple(i -> 0.0 : W0 / price0_option[i], Val(5))
# Reduce code duplication by adding `price0` argument:
function utility(share::SVector{5}, price0::SVector{5})
gain_risk = 0.0 # 0.0 instead of 0 to avoid type instability
invest_risk = 0.0
@inbounds for i in 1 : 5
gain_risk += share[i] * price0[i] * R_mean[i]
invest_risk += share[i] * price0[i]
end
gain = gain_risk + (W0 - invest_risk) * Rf
w = share .* price0
penalty = 0.5 * gamma * (w' * CovMat * w)
return gain - penalty
end
# U only seems to be used in maximum(U), so just skip creating U:
# U = Array{Float64}(undef, ntuple(i -> length(share_range[i]), 5))
Umax = -Inf
@inbounds for share in Iterators.product(share_range...)
Umax = max(Umax, utility(SVector(share), price0))
end
Umax_option = -Inf
@inbounds for share in Iterators.product(share_range_option...)
Umax_option = max(Umax_option, utility(SVector(share), price0_option))
end
return Umax_option - Umax
end
2.063 s (0 allocations: 0 bytes) (versus 33.643 s (382686706 allocations: 46.33 GiB)) on my machine. Now that allocations are down to zero, you could start thinking about multi-threading if you need more speed.
I note that you actually only need share .* price0 in the utility computation, and price0 and price0_option are fixed over the main loop. So you can rescale price0 and price0_option to unity and instead rescale share_range and share_range_option. This removes a lot of products.