For loops acceleration

I have a function that involves several for loops that I would like to accelerate a little bit, but not sure if multi-threading is useful in my case.

function SynDiv(option_leverage = 5, Rf = 1, gamma = 0.1)

    W0 = 100
    price0 = 5:5:25
    R_mean = [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
    CovMat = [0.0089436489 -0.0017853213  0.0059088180  5.054594e-03  2.074978e-04;
            -0.0017853213  0.0013810899 -0.0015338188 -7.490523e-04  1.392822e-04;
            0.0059088180 -0.0015338188  0.0060244430  4.953837e-03  2.630359e-04;
            0.0050545936 -0.0007490523  0.0049538372  5.525412e-03 -1.619497e-05;
            0.0002074978  0.0001392822  0.0002630359 -1.619497e-05  4.827990e-04]
    share1_range = 0:(W0/price0[1])
    share2_range = 0:(W0/price0[2])
    share3_range = 0:(W0/price0[3])
    share4_range = 0:(W0/price0[4])
    share5_range = 0:(W0/price0[5])

    function utility(share)

        gain_risk = 0
        invest_risk = 0
        
        for i in 1:5

        gain_risk = gain_risk + share[i]*price0[i]*R_mean[i]
        invest_risk = invest_risk + share[i]*price0[i]
        
        
        end

        gain = gain_risk + (W0 - invest_risk)*Rf

        w = share .* price0
        penalty = 0.5 * gamma * w' * CovMat * w

        return gain - penalty


    end


    U = zeros(length(share1_range), length(share2_range), length(share3_range), length(share4_range), length(share5_range))

    for a in 1:length(share1_range)

        for b in 1:length(share2_range)
        
            for c in 1:length(share3_range)
        
                for d in 1:length(share4_range)

                    for e in 1:length(share5_range)
        
                            U[a, b, c, d, e] = utility([share1_range[a],
                                                        share2_range[b],
                                                        share3_range[c],
                                                        share4_range[d],
                                                        share5_range[e]])
                    end
        
        
                end
            
            end
        
        end
        
        
    end



    price0_option = price0/option_leverage
    share1_range_option = 0:(W0/price0_option[1])
    share2_range_option = 0:(W0/price0_option[2])
    share3_range_option = 0:(W0/price0_option[3])
    share4_range_option = 0:(W0/price0_option[4])
    share5_range_option = 0:(W0/price0_option[5])


    function utility_option(share)


        gain_risk = 0
        invest_risk = 0
        
        for i in 1:5

        gain_risk = gain_risk + share[i]*price0_option[i]*R_mean[i]
        invest_risk = invest_risk + share[i]*price0_option[i]
        
        
        end

        gain = gain_risk + (W0 - invest_risk)*Rf

        w = share .* price0_option
        penalty = 0.5 * gamma * w' * CovMat * w

        return gain - penalty


    end




    U_option = zeros(length(share1_range_option), length(share2_range_option), length(share3_range_option), length(share4_range_option), length(share5_range_option))

    for a in 1:length(share1_range_option)

        for b in 1:length(share2_range_option)
        
            for c in 1:length(share3_range_option)
        
                for d in 1:length(share4_range_option)

                    for e in 1:length(share5_range_option)
        
                            U_option[a, b, c, d, e] = utility_option([share1_range_option[a],
                                                        share2_range_option[b],
                                                        share3_range_option[c],
                                                        share4_range_option[d],
                                                        share5_range_option[e]])
                    end
        
        
                end
            
            end
        
        end
        
        
    end


    return maximum(U_option) - maximum(U)

end

You’re allocating a lot of small vectors of length 5. Making tuples instead should decrease memory use.

Switching the order of the loops should help. That is, I’d use

for e in...
    for d in...
        for c in...

instead.
Or, better yet, instead keep the loops the same, but permute the dimensions of U, so that it is U[e,d,c,b,a]. I say this is better, because it looks like share1_range is the longest iterator.

Then, try an @threads on the outer loop (for a in 1:length(share1_range)).

4 Likes

I add @inbounds to all for loops and the function seems faster. Could you be more specific in how to use tuples?

you are creating temporary vectors here:

U[a, b, c, d, e] = utility([share1_range[a],
                            share2_range[b],
                            share3_range[c],
                            share4_range[d],
                            share5_range[e]])

Use tuples instead:

U[a, b, c, d, e] = utility((share1_range[a],
                            share2_range[b],
                            share3_range[c],
                            share4_range[d],
                            share5_range[e]))

i.e. replace [ … ] with ( … )

I did that and saw a 25% speedup, any other places I could accelerate it?

Some observations:

  • a lot of variables are small vectors or matrices, so using StaticArrays will probably give you a big win.
  • after allocating and filling U and U_option, all that’s done with these large arrays is taking the maximum. By taking a running maximum you can avoid those allocations and any indexing worries.
  • there’s a lot of duplicated code / cleanup opportunities.

I couldn’t resist, so here’s my version:

using StaticArrays

function SynDiv2(option_leverage = 5, Rf = 1, gamma = 0.1)
    W0 = 100
    price0 = SVector{5, Float64}(5 : 5 : 25)
    price0_option = price0 / option_leverage
    R_mean = @SVector [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
    CovMat = @SMatrix [0.0089436489 -0.0017853213  0.0059088180  5.054594e-03  2.074978e-04;
            -0.0017853213  0.0013810899 -0.0015338188 -7.490523e-04  1.392822e-04;
            0.0059088180 -0.0015338188  0.0060244430  4.953837e-03  2.630359e-04;
            0.0050545936 -0.0007490523  0.0049538372  5.525412e-03 -1.619497e-05;
            0.0002074978  0.0001392822  0.0002630359 -1.619497e-05  4.827990e-04]
    share_range = ntuple(i -> 0.0 : W0 / price0[i], Val(5))
    share_range_option = ntuple(i -> 0.0 : W0 / price0_option[i], Val(5))

    # Reduce code duplication by adding `price0` argument:
    function utility(share::SVector{5}, price0::SVector{5})
        gain_risk = 0.0 # 0.0 instead of 0 to avoid type instability
        invest_risk = 0.0
        @inbounds for i in 1 : 5
            gain_risk += share[i] * price0[i] * R_mean[i]
            invest_risk += share[i] * price0[i]
        end
        gain = gain_risk + (W0 - invest_risk) * Rf
        w = share .* price0
        penalty = 0.5 * gamma * (w' * CovMat * w)
        return gain - penalty
    end

    # U only seems to be used in maximum(U), so just skip creating U:
    # U = Array{Float64}(undef, ntuple(i -> length(share_range[i]), 5))
    Umax = -Inf
    @inbounds for share in Iterators.product(share_range...)
        Umax = max(Umax, utility(SVector(share), price0))
    end

    Umax_option = -Inf
    @inbounds for share in Iterators.product(share_range_option...)
        Umax_option = max(Umax_option, utility(SVector(share), price0_option))
    end

    return Umax_option - Umax
end

2.063 s (0 allocations: 0 bytes) (versus 33.643 s (382686706 allocations: 46.33 GiB)) on my machine. Now that allocations are down to zero, you could start thinking about multi-threading if you need more speed.

10 Likes

And a threaded version for good measure:

using StaticArrays

function max_utility(utility::F, share_range, price0) where F
    Umaxes = fill(-Inf, Threads.nthreads())
    Threads.@threads for head in share_range[1]
        tid = Threads.threadid()
        @inbounds Umax_thread = Umaxes[tid]
        @inbounds for tail in Iterators.product(Base.tail(share_range)...)
            share = SVector(tuple(head, tail...))
            Umax_thread = max(Umax_thread, utility(share, price0))
        end
        @inbounds Umaxes[tid] = Umax_thread
    end
    return maximum(Umaxes)
end

function SynDiv2(option_leverage = 5, Rf = 1, gamma = 0.1)
    W0 = 100
    price0 = SVector{5, Float64}(5 : 5 : 25)
    price0_option = price0 / option_leverage
    R_mean = @SVector [1.164795, 1.001645, 1.13878, 1.190089, 0.9832912]
    CovMat = @SMatrix [0.0089436489 -0.0017853213  0.0059088180  5.054594e-03  2.074978e-04;
            -0.0017853213  0.0013810899 -0.0015338188 -7.490523e-04  1.392822e-04;
            0.0059088180 -0.0015338188  0.0060244430  4.953837e-03  2.630359e-04;
            0.0050545936 -0.0007490523  0.0049538372  5.525412e-03 -1.619497e-05;
            0.0002074978  0.0001392822  0.0002630359 -1.619497e-05  4.827990e-04]
    share_range = ntuple(i -> 0.0 : W0 / price0[i], Val(5))
    share_range_option = ntuple(i -> 0.0 : W0 / price0_option[i], Val(5))

    @inline @inbounds function utility(share::SVector{5}, price0::SVector{5})
        w = share .* price0
        invest_risk = sum(w)
        gain_risk = sum(w .* R_mean)
        gain = gain_risk + (W0 - invest_risk) * Rf
        penalty = 0.5 * gamma * (w' * CovMat * w)
        return gain - penalty
    end

    Umax = max_utility(utility, share_range, price0)
    Umax_option = max_utility(utility, share_range_option, price0_option)

    return Umax_option - Umax
end

477.380 ms (4 allocations: 1.47 KiB) on a 6-core machine.

Edit: fixed a bug with uninitialized data.

10 Likes

We badly need higher-level threading primitives (threaded mapreduce).

5 Likes

I note that you actually only need share .* price0 in the utility computation, and price0 and price0_option are fixed over the main loop. So you can rescale price0 and price0_option to unity and instead rescale share_range and share_range_option. This removes a lot of products.

2 Likes

My life rescued!