Improving Performance of this Code

Hello!

I am trying to solve a big model and at the core I have the following loop which essentially solves a Backward Induction Problem. In what follows I provide a MWE.

In my model, I am solving this problem many many times and therefore I would like to push its performance to the max, however, at this point I do not know how to improve it further.

I believe I can make improvements both in terms of memory usage and computation speed. Any suggestions are welcome.

using Distributed
using BenchmarkTools
using ProgressMeter
using BitOperations
using Profile
using Traceur

addprocs(11, exeflags="--project=.")



@everywhere J = 100
@everywhere T = 200
@everywhere Π = 10*rand(J,T)
@everywhere Dᵢ = rand([0],J,1)
@everywhere Sj = 20*rand(J,T)
@everywhere Fj = 20*rand(J,T)

@everywhere Π=Π.-Sj

@everywhere    function myfun1(Π,Fj,Sj,Dᵢ,β,J,T)
                D = zeros(Int64,2,T,J)
                v = zeros(Float64,2,T,J)
                Dstar = zeros(Int64,J,T+1)
                Dstar[:,1]=Dᵢ
                S = [0 1]
                Πstar=0
                @inbounds for t=0:T-1, j=1:J, i=1:2
                    if t==0
                        payoff = Π[j,T-t]+Sj[j,T-t]*S[i]
                        D[i,T-t,j]=(payoff>0)
                        v[i,T-t,j]=D[i,T-t,j]*payoff
                    else
                        payoff_enter = Π[j,T-t]+Sj[j,T-t]*S[i]+β*v[2,T-t+1,j]
                        payoff_exit = β*v[1,T-t+1,j]
                        d=(payoff_enter>payoff_exit)
                        D[i,T-t,j] = d
                        v[i,T-t,j]=d*(payoff_enter)+(1-d)*payoff_exit
                    end
                end
                #println(v)
                @inbounds for t=2:T+1, j=1:J
                        Dstar[j,t]=D[Dstar[j,t-1]+1,t-1,j]
                end
                Dstar = Dstar[:,2:end]
                return Dstar
end



println("TIMES")
f1 = @btime myfun1(Π,Fj,Sj,Dᵢ,0.9,J,T)


@profile myfun1(Π,Fj,Sj,Dᵢ,0.9,J,T)


@trace myfun1(Π,Fj,Sj,Dᵢ,0.9,J,T)

Any suggestion will be very much appreciated!

Thank you for the MWE. Would you mind formatting and indenting your code to make it easier for us to read and help? I great guide on this can be found here: PSA: make it easier to help you

1 Like

I am really sorry! Here it is

So first off, you seem to spawn a bunch of julia processes at the start and spread your definitions to those processed with @everywhere but I never see any actual parallel computing in this code. Am I missing something, or did you forget to make a loop parallel?

You’re right I am not parallelizing anything.This code is embedded into some other parallel which I run in parallel, but here it does not play any role.

As a first pass, I was able to shave a third off the runtime by simply putting your two sets of for loops into separate functions:

function myfun2(Π,Fj,Sj,Dᵢ,β,J,T)
    D = zeros(Int64,2,T,J)
    v = zeros(Float64,2,T,J)
    Dstar = zeros(Int64,J,T+1)
    Dstar[:,1]=Dᵢ
    S = [0 1]
    Πstar=0
    func2inner1!(Π, Sj, S, D, v, β, J, T)
    funct2inner2!(Dstar, D, T, J)
end

function func2inner1!(Π, Sj, S, D, v, β, J, T)
    @inbounds for t ∈ 0:T-1, j ∈ 1:J, i ∈ 1:2
        if t==0
            payoff     = Π[j,T-t]+Sj[j,T-t]*S[i]
            D[i,T-t,j] = (payoff>0)
            v[i,T-t,j] = D[i,T-t,j]*payoff
        else
            payoff_enter = Π[j,T-t]+Sj[j,T-t]*S[i]+β*v[2,T-t+1,j]
            payoff_exit  = β*v[1,T-t+1,j]
            d=(payoff_enter>payoff_exit)
            D[i,T-t,j]   = d
            v[i,T-t,j]   = d*(payoff_enter)+(1-d)*payoff_exit
        end
    end
end

function funct2inner2!(Dstar, D, T, J)
    @inbounds for t=2:T+1, j=1:J
        Dstar[j,t]=D[Dstar[j,t-1]+1,t-1,j]
    end
    Dstar = Dstar[:,2:end]
end
julia> f1 = @btime myfun1($Π,$Fj,$Sj, $Dᵢ, $(0.9), $J, $T);
  342.202 μs (9 allocations: 938.72 KiB)

julia> f2 = @btime myfun2($Π,$Fj,$Sj, $Dᵢ, $(0.9), $J, $T);
  265.630 μs (9 allocations: 938.72 KiB)

julia> f1 == f2
true

I’m sure there’s more performance to be had here, but it’s a little hard without knowing what exactly this function is doing. For instance, your iteration order is inefficient. You want to be accessing your arrays column, first then row. More performance tips can be found here: https://docs.julialang.org/en/v1/manual/performance-tips/index.html

1 Like

Thanks. This is very useful

Minor nitpick:

The last assignment isn’t actually doing anything useful. This only works out right thanks to implicit return of the last computed value.

1 Like

@GunnarFarneback I’m not sure what you mean by “not doing anything useful”. It’s dropping a column which is required for the correct output. It’s not mutating Dstar if that’s what you mean, but neither does the original implementation. Are you just objecting to me not writing the unnecessary ‘return’ statement?

The assignment is unnecessary. You could just write

return Dstar[:,2:end]

BTW, return is not unnecessary, it really helps readability.

I’m objecting to assigning a new binding to Dstar, which then immediately goes out of scope without affecting anything. Try adding return Dstar at the end of myfun2 (without changing anything else) to see what I mean.

Oh yeah, that unnecessary assignment just comes from the fact that I was just copy pasting and moving around the OP’s code. I made some minor stylistic tweaks like removing return but didn’t go through any great effort to be consistent or thorough. I definitely wouldn’t normally do that unnecessary assignment in my own code, but I generally disagree with return statements unless actually necessary for things like short circuiting.

This is pretty off topic though, I just wanted to make sure there wasn’t some actual technical issue that I had missed,

1 Like

That’s interesting. Could you perhaps provide some intuition for why separating the two for loops in separate functions increases performance?