Speeding Up a Backward Induction Code

jmcastro2109 · August 20, 2019, 1:35pm

Hi all! I am solving a model and after Profiling the code I realized that the function myfun2 is the big bottleneck of my code. I created a MWE that preserves the computational cost of doing this function and I wanted to ask for help optimizing this function and speeding it as much as possible. Any suggestion or improvement would be very very much appreciated.

Basically, in this code what I do is to evaluate the payoffs of different paths. It is a small combinatorial discrete choice problem and of course, suffers from the course of dimensionality. I a trying to keep it at low dimensions (K<12), but when K is bigger than 8 it already becomes really slow.

Here is the MWE:

using BenchmarkTools

J = 30
T = 20
Π = 75*rand(J,T)
Dᵢ = rand([0],J,1)
feat1 = rand([1,2],J,1)
feat2 = rand([1,2],J,1)
Φ1 = 50
Φ2 = 50
Sj = 200*ones(J)

Π=Π.-Sj

K = 10
Ind =([1,4,10,12,J-15,J-10,J-6,J-5,J-2,J])

function myfun1(Ind,K,D::Array{Int64},Φ1,Φ2,feat1,feat2) #Think about how to make this faster
    X1c = zeros(Int8,size(D,1))
    X2c = zeros(Int8,size(D,1))
    X1l = zeros(Int8,size(D,1))
    X2l = zeros(Int8,size(D,1))
    countc = zeros(Float64,size(D,1))
    countl = zeros(Float64,size(D,1))
    feat1=feat1[Ind]
    feat2=feat2[Ind]
    for j=1:K
        for l=1:K
            X1c[j] = X1c[j] + (feat1[j]==feat1[l])*D[l]
            X1l[j] = X1l[j] + (feat2[j]==feat2[l])*D[l]
            countc[j] = countc[j] + (feat1[j]==feat1[l])
            countl[j] = countl[j] + (feat2[j]==feat2[l])
        end
    end
    X1c = Φ1.*(X1c.>0)./countc
    X1l = Φ2.*(X1l.>0)./countl
    fcc = (ones(K,1)'*X1c)[1]
    fcl = (ones(K,1)'*X1l)[1]
    return fcc, fcl
end

function myfun2(Ind,Π,Φ1,Φ2,Sj,Dᵢ,feat1,feat2,K,J,T)
    β = 0.95
    Π = Π[Ind,:]
    Sj = Sj[Ind]
    D = zeros(Int64,2^K,K,T)
    Indices =  ones(Int64,2^K,T)
    ind =  ones(Int64,T+1)
    vc = zeros(Float64,2^K,2^K,T)
    v = zeros(Float64,2^K,T)
    Dstar = zeros(Int64,K,T+1)
    Dstar[:,1]=Dᵢ[Ind]
    Πstar=0
    ChoiceSet = zeros(Int64,2^K,K)
    for j=0:2^K-1
        int_s = collect(string(j, base=2, pad=K))
        int_s = parse.(Int8,int_s)
        ChoiceSet[j+1,:]=int_s
    end
    for t=0, i=1:2^K
        for ii=1:2^K
            (eg,el)=myfun1(Ind,K,ChoiceSet[ii,:],Φ1,Φ2,feat1,feat2)
            vc[i,ii,T-t] = sum(ChoiceSet[ii,:].*(Π[:,T-t] .+ Sj.*ChoiceSet[i,:])) .- (eg .+ el)
        end
        v[i,T-t] = maximum(vc[i,:,T-t])
        Indices[i,T-t] = argmax(vc[i,:,T-t])
        D[i,:,T-t] = ChoiceSet[argmax(vc[i,:,T-t]),:]
    end
    for t=1:T-1, i=1:2^K
        for ii=1:2^K
            (eg,el)=myfun1(Ind,K,ChoiceSet[ii,:],Φ1,Φ2,feat1,feat2)
            vc[i,ii,T-t] = sum(ChoiceSet[ii,:].*(Π[:,T-t] .+ Sj.*ChoiceSet[i,:])) .- (eg .+ el)+β*v[ii,T-t+1]
        end
        v[i,T-t] = maximum(vc[i,:,T-t])
        Indices[i,T-t] = argmax(vc[i,:,T-t])
        D[i,:,T-t] = ChoiceSet[argmax(vc[i,:,T-t]),:]
    end
    for t=2:T+1
        ind[t] = Indices[ind[t-1],t-1]
        Dstar[:,t]=D[ind[t],:,t-1]
    end
    for t=2:T+1
        (eg,el)=myfun1(Ind,K,Dstar[:,t],Φ1,Φ2,feat1,feat2)
        for j=1:K
            π=β^(t-2)*Dstar[j,t]*(Π[j,t-1]+Sj[j]*Dstar[j,t-1])
            Πstar=π+Πstar
        end
        Πstar = Πstar - β^(t-2)*(eg+el)
    end
    Dstar = Dstar[:,2:end]
    return Dstar, Πstar
end

@benchmark myfun2(Ind,Π,Φ1,Φ2,Sj,Dᵢ,feat1,feat2,K,J,T)

BenchmarkTools.Trial:
  memory estimate:  53.14 GiB
  allocs estimate:  440488318
  --------------
  minimum time:     32.632 s (7.66% GC)
  median time:      32.632 s (7.66% GC)
  mean time:        32.632 s (7.66% GC)
  maximum time:     32.632 s (7.66% GC)
  --------------
  samples:          1
  evals/sample:     1

StefanKarpinski · August 20, 2019, 1:48pm

It’s hard to tell from visual inspection but are you using non-constant globals inside of your functions?

jmcastro2109 · August 20, 2019, 1:54pm

I would like to say I am not, but I am not 100% certain.

rdeits · August 20, 2019, 2:07pm

The first thing I notice is that your code creates a lot of slices of arrays, like vc[i,:,T-t] etc.

In Julia, those slices create copies of the data, so you are creating copies of your arrays all over the place. You can instead use @view(vc[i, :, T - t]) to create a view into that data, which will not copy it.

tkoolen · August 20, 2019, 2:24pm

It appears you’re not using non-const globals (tested by introducing a let block around the globals and the @benchmark call and interpolating the variables into the @benchmark).

jmcastro2109 · August 20, 2019, 2:28pm

Just to confirm:

That is the way it is supposed to be, right?

tkoolen · August 20, 2019, 2:30pm

Yes, avoiding non-const global variables is a good thing. It’s the first thing to check for (literally, the first item in Performance Tips · The Julia Language).

jmcastro2109 · August 20, 2019, 3:06pm

This certainly helped! In the MWE, the performance enhancement is not very big, but in the main code it is!

jmcastro2109 · August 20, 2019, 5:42pm

Is there any other suggestion?

Oscar_Smith · August 20, 2019, 5:44pm

I’ll probably take a look in a few hours. It definitely has some opportunities for improvement

jmcastro2109 · August 20, 2019, 5:45pm

Thanks so much!

StefanKarpinski · August 20, 2019, 6:00pm

If you are doing that everywhere you can use @views begin ... end to apply the @view macro to every indexing expression.

bcmichael · August 20, 2019, 8:35pm

There are definitely other optimization opportunities here, but your biggest problem seems to be that you call myfun1 way more times than you need to.
(eg,el)=myfun1(Ind,K,ChoiceSet[ii,:],Φ1,Φ2,feat1,feat2)
is in two loops over i when it only depends on ii. Precalculating this so you only run it once for each ii instead of 2^(K+1) times results in ~6x speedup of your example on my computer.

Edit: I actually got that wrong. You were calling it T*2^K times which is even worse.

jmcastro2109 · August 20, 2019, 8:52pm

Thanks so much!!!

hendri54 · August 20, 2019, 8:56pm

Also note that you are calling a maximum operation on the same object three times. E.g.:

These could be rolled into a single operation (find the max and its indices).

And if this is backward induction, it should not be necessary to allocate all of the matrices for all dates. For the date t computation, all you need is the date t+1 continuation values. Or do I misunderstand why you allocate everything by t?

jmcastro2109 · August 20, 2019, 9:00pm

Yes! I just corrected this issue in the following way:

tempind = argmax(@view(vc[i,:,T-t]))
        Indices[i,T-t] = tempind
        v[i,T-t]=vc[i,tempind,T-t]
        D[i,:,T-t] = ChoiceSet[tempind,:]

You are also right with your second observation.

Daniel_Berge · August 20, 2019, 9:15pm

It appears you are taking a very round-about way to make a Boolean array from the bits, by making a string and parsing it back.

    for j=0:2^K-1
        int_s = collect(string(j, base=2, pad=K))
        int_s = parse.(Int8,int_s)
        ChoiceSet[j+1,:]=int_s
    end

Using BitOperations.jl would make this much more direct, and without the allocations.

let int_s = fill(false,K) #pre-allocate output
        for j=0:2^K-1
                @. int_s = bget(j,0:K-1)
                ChoiceSet[j+1,:]=int_s
        end
end

DNF · August 20, 2019, 10:07pm

Do you mean to just take sums here? Isn’t this just fcc = sum(X1c) etc?

DNF · August 21, 2019, 6:43am

Your code takes forever to run, which makes it really frustrating to benchmark and iteratively optimize. Can you scale it down by suggesting smaller, but still relevant, array sizes?

DNF · August 21, 2019, 7:50am

Or use the digits or digits! functions from Base.

Topic		Replies	Views
Performance Tips for Combinatorial Problem Performance	16	1130	January 10, 2020
Pushing a function to its performance limit Performance	16	789	March 9, 2020
Need help with performance in a very large loop Performance performance	28	1376	May 28, 2020
Improving Performance of this Code Performance economics , loops	12	1333	November 22, 2019
My julia code is somehow much slower than the matlab code New to Julia question , performance , matlab	55	4386	December 30, 2022

Speeding Up a Backward Induction Code

Related topics