Performance Tips for Combinatorial Problem

jmcastro2109 · December 12, 2019, 10:10pm

Hello! I have a combinatorial discrete choice problem that I solve by Backward Induction. The problem seems to be intensive in both memory and computation time and I would be super happy to improve it on any of these two dimensions. Right now, the problem runs Out of Memory when K>12. I post a MWE

using BenchmarkTools
using ProgressMeter
using BitOperations
using Distributions

J = 1000
T = 20
Π = 100*rand(J,T)
Dᵢ = rand([0],J,1)
feat1 = rand([1,2,3,4,5,6,7,8],J,1)
feat2 = rand([1,2,3,4,5,6,7,8,9,10],J,1)
Φ1 = 10
Φ2 = 10
Sj = 20*rand(J,T)
Fj = 20*rand(J,T)

Π=Π.-Sj.-Fj

K = 10
Ind =([1,4,7,9,10,12,J-10,J-6,J-5,J-2])

# This is the Backward Induction Algorithm with interdependent choices

function myfun1(Ind,K,D::Array{Int64},Φ1,Φ2,feat1,feat2) 
    X1c = zeros(Int8,size(D,1))
    X2c = zeros(Int8,size(D,1))
    X1l = zeros(Int8,size(D,1))
    X2l = zeros(Int8,size(D,1))
    countc = zeros(Float64,size(D,1))
    countl = zeros(Float64,size(D,1))
    feat1=feat1[Ind]
    feat2=feat2[Ind]
    for j=1:K
        for l=1:K
            X1c[j] = X1c[j] + (feat1[j]==feat1[l])*D[l]
            X1l[j] = X1l[j] + (feat2[j]==feat2[l])*D[l]
            countc[j] = countc[j] + (feat1[j]==feat1[l])
            countl[j] = countl[j] + (feat2[j]==feat2[l])
        end
    end
    X1c = Φ1.*(X1c.>0)./countc
    X1l = Φ2.*(X1l.>0)./countl
    fcc = sum(X1c)
    fcl = sum(X1l)
    return fcc, fcl
end


function pshockprob(sequences,prob)
    (KK,K)=size(sequences)
    proba = zeros(KK)
    for i=1:KK
        proba[i]=prob^(sum(sequences[i,:]))*(1-prob)^(K-sum(sequences[i,:]))
    end
    return proba
end


function shocks2ind(sequences,shockpath)
    (K,T)=size(shockpath)
    inds = zeros(T)
    shockpath[shockpath.==2] .= 0
    for i=1:T
        inds[i]=matchrow(shockpath[:,i]',sequences)
    end
    return convert(Array{Int64}, inds)
end


function myfun7(Ind,Π,Φ1,Φ2,Sj,prob,SeqShocks,Dᵢ,feat1,feat2,K,J,T)
    β = 0.9
    Π = Π[Ind,:]
    Sj = Sj[Ind,:]
    SeqShocks = SeqShocks[Ind,:]
    D = zeros(Int64,2^K,2^K,K,T)
    Indices =  ones(Int64,2^K,2^K,T)
    Dpast =  ones(Int64,T+1)
    v = zeros(Float64,2^K,2^K,T)
    Dstar = zeros(Int64,K,T+1)
    Dstar[:,1]=Dᵢ[Ind]
    Πstar=0
    ChoiceSet = zeros(Int64,2^K,K)
    ShockSet = zeros(Int64,2^K,K)
    GroupCosts = zeros(Float64,2^K,2^K,2)
    let int_s = fill(false,K) #pre-allocate output
        for j=0:2^K-1
            @. int_s = bget(j,0:K-1)
            ChoiceSet[j+1,:]=int_s
            ShockSet[j+1,:]=int_s
        end
    end
    for j=1:2^K, jj=1:2^K
        (GroupCosts[j,jj,1],GroupCosts[j,jj,2])=myfun1(Ind,K,ChoiceSet[j,:].*ShockSet[jj,:],Φ1,Φ2,feat1,feat2)
    end
    pstatevec=pshockprob(ShockSet,prob)
    indshocks=shocks2ind(ShockSet,SeqShocks)
    #println(indshocks)
    #println(ShockSet)
    #println(pstatevec)
    #println(GroupCosts)
    vc = zeros(Float64, 2^K)
    for t=0:T-1
        for  i=1:2^K, iii=1:2^K
            #Basically what is wrong here is that I am still allowing people to enter
            fill!(vc, 0.0)
            for ii=1:2^K
                    for j=1:K
                        vc[ii] += ChoiceSet[ii,j]*(Π[j,T-t]+Sj[j,T-t]*ChoiceSet[i,j]-(1-ShockSet[iii,j])*999999999999999999999999999)
                    end
                    vc[ii] -= GroupCosts[ii,iii,1] + GroupCosts[ii,iii,2]
                    if t>0
                        vc[ii] += β*v[ii,:,T-t+1]'*pstatevec
                    end
            end
            #println(vc)
            tempind = argmax(vc)
            Indices[i,iii,T-t] = tempind
            v[i,iii,T-t]=vc[tempind]
            D[i,iii,:,T-t] = ChoiceSet[tempind,:]
            end
    end
    for t=2:T+1
        Dpast[t] = Indices[Dpast[t-1],indshocks[t-1],t-1]
        Dstar[:,t]=D[Dpast[t],indshocks[t-1],:,t-1]
    end
    Dstar = Dstar[:,2:end]
    return Dstar
end

println("TIMES")
p = 0.3
SeqShocks = (rand(J,T+1).>p).+1

BIC = @btime myfun7(Ind,Π,Φ1,Φ2,Sj,p,SeqShocks,Dᵢ,feat1,feat2,K,J,T)
println(sum(BIC))

Thanks so much in advance!

Syx_Pek · December 13, 2019, 6:23am

I don’t quite understand the problem so far, some description on what you wish to achieve might be helpful. But there are a few tips I see so far:

You can do myfun1 without the allocation:

function newfun1(Ind, K, D::Array{Int64}, Φ1, Φ2,
                 feat1, feat2)
    fcc, fcl = 0.0, 0.0

    for j in Ind
        Tempc, Templ = 0.0, 0.0
        Ctc, Ctl = 0, 0
        for l in Ind
            Eq1 = feat1[j] == feat1[l]
            Eq2 = feat2[j] == feat2[l]
            Ctc += Eq1
            Ctl += Eq2
            Tempc += Eq1*D[l]
            Templ += Eq2*D[l]
        end
        fcc += Tempc/Ctc
        fcl += Templ/Ctl
    end
    fcc*Φ1, fcl*Φ2
end

You are calling myfun1 many times with the same parameters, you can basically “memoize” the result in a variety of ways.

jmcastro2109 · December 13, 2019, 2:11pm

Hey! Thanks for the reply. Essentially, my problem is one in which I need to solve for the optimal choices of an entity using Backward Induction. The problem becomes combinatorial because choices are interdependent so the entity is choosing over vectors of size K, and the number of potential choices corresponds to the Power set 2^K.

The main problem I have right now is that for large K (K>10), when I do the allocations at the beginning of myfun7 I have a memory problem.

stevengj · December 13, 2019, 3:56pm

You’re not doing all allocations at the beginning, because lines like this

Dstar[:,t]=D[Dpast[t],indshocks[t-1],:,t-1]

allocate on the right-hand side. Slices allocate by default in Julia. You can avoid allocation from slices by putting @views at the beginning, e.g. @views function myfun7(...)

jmcastro2109 · December 13, 2019, 4:22pm

This is indeed a very helpful advice. For K=7, it reduces memory allocations by a quarter and computation time by half. So thanks a lot!!

The code is still slow but I guess that on the memory front it is fine. Still it is a little bit slow because of the huge size of the loops.

I am posting here the updated version of the code:

using BenchmarkTools
using ProgressMeter
using BitOperations
using Distributions

J = 1000
T = 20
Π = 100*rand(J,T)
Dᵢ = rand([0],J,1)
feat1 = rand([1,2,3,4,5,6,7,8],J,1)
feat2 = rand([1,2,3,4,5,6,7,8,9,10],J,1)
Φ1 = 10
Φ2 = 10
Sj = 20*rand(J,T)
Fj = 20*rand(J,T)

Π=Π.-Sj.-Fj

K = 7
Ind =([1,4,7,10,12,13,14])

#@everywhere K = 2
#@everywhere Ind =([1,2])


matchrow(a,B) = findfirst(i->all(j->a[j] == B[i,j],1:size(B,2)),1:size(B,1))

# This is the Backward Induction Algorithm with interdependent choices

@views function myfun1(Ind,K,D::Array{Int64},Φ1,Φ2,feat1,feat2) #Think about how to make this faster
    X1c = zeros(Int8,size(D,1))
    X2c = zeros(Int8,size(D,1))
    X1l = zeros(Int8,size(D,1))
    X2l = zeros(Int8,size(D,1))
    countc = zeros(Float64,size(D,1))
    countl = zeros(Float64,size(D,1))
    feat1=feat1[Ind]
    feat2=feat2[Ind]
    for j=1:K
        for l=1:K
            X1c[j] = X1c[j] + (feat1[j]==feat1[l])*D[l]
            X1l[j] = X1l[j] + (feat2[j]==feat2[l])*D[l]
            countc[j] = countc[j] + (feat1[j]==feat1[l])
            countl[j] = countl[j] + (feat2[j]==feat2[l])
        end
    end
    X1c = Φ1.*(X1c.>0)./countc
    X1l = Φ2.*(X1l.>0)./countl
    fcc = sum(X1c)
    fcl = sum(X1l)
    return fcc, fcl
end

@views function pshockprob(sequences,prob)
    (KK,K)=size(sequences)
    proba = zeros(KK)
    for i=1:KK
        proba[i]=prob^(sum(sequences[i,:]))*(1-prob)^(K-sum(sequences[i,:]))
    end
    return proba
end


@views function shocks2ind(sequences,shockpath)
    (K,T)=size(shockpath)
    inds = zeros(T)
    shockpath[shockpath.==2] .= 0
    for i=1:T
        inds[i]=matchrow(shockpath[:,i]',sequences)
    end
    return convert(Array{Int64}, inds)
end


@views function myfun7(Ind,Π,Φ1,Φ2,Sj,prob,SeqShocks,Dᵢ,feat1,feat2,K,J,T)
    β = 0.9
    Π = Π[Ind,:]
    Sj = Sj[Ind,:]
    SeqShocks = SeqShocks[Ind,:]
    Indices =  ones(Int64,2^K,2^K,T)
    Dpast =  ones(Int64,T+1)
    v = Array{Float64}(undef,2^K,2^K,T)
    Dstar = zeros(Int64,K,T+1)
    Dstar[:,1]=Dᵢ[Ind]
    Πstar=0
    ChoiceSet = Array{Union{Nothing, Int}}(nothing,2^K,K)
    ShockSet = Array{Union{Nothing, Int}}(nothing,2^K,K)
    GroupCosts = Array{Float64}(undef,2^K,2^K,2)
    let int_s = fill(false,K) #pre-allocate output
        for j=0:2^K-1
            @. int_s = bget(j,0:K-1)
            ChoiceSet[j+1,:]=int_s
            ShockSet[j+1,:]=int_s
        end
    end
    for j=1:2^K, jj=1:2^K
        GroupCosts[j,jj,:].=myfun1(Ind,K,ChoiceSet[j,:].*ShockSet[jj,:],Φ1,Φ2,feat1,feat2)
    end
    pstatevec=pshockprob(ShockSet,prob)
    indshocks=shocks2ind(ShockSet,SeqShocks)
    #println(indshocks)
    #println(ShockSet)
    #println(pstatevec)
    #println(GroupCosts)
    vc = zeros(Float64, 2^K)
    for t=0:T-1
        for  i=1:2^K, iii=1:2^K
            fill!(vc, 0.0)
            for ii=1:2^K
                    for j=1:K
                        vc[ii] += ChoiceSet[ii,j]*(Π[j,T-t]+Sj[j,T-t]*ChoiceSet[i,j]-(1-ShockSet[iii,j])*999999999999999999999999999)
                    end
                    vc[ii] -= GroupCosts[ii,iii,1] + GroupCosts[ii,iii,2]
                    if t>0
                        vc[ii] += β*v[ii,:,T-t+1]'*pstatevec
                    end
            end
            #println(vc)
            tempind = argmax(vc)
            Indices[i,iii,T-t] = tempind
            v[i,iii,T-t]=vc[tempind]
            end
    end
    for t=2:T+1
        Dpast[t] = Indices[Dpast[t-1],indshocks[t-1],t-1]
        Dstar[:,t]= ChoiceSet[Dpast[t],:]
    end
    Dstar = Dstar[:,2:end]
    return Dstar
end


println("TIMES")
p = 0.3
SeqShocks = (rand(J,T+1).>p).+1


BIC = @btime myfun7(Ind,Π,Φ1,Φ2,Sj,p,SeqShocks,Dᵢ,feat1,feat2,K,J,T)

bcmichael · December 13, 2019, 5:25pm

The way you are handling the data in v is also killing your performance. You only need to keep v from the previous iteration over t rather than storing the values from every t. You’re also doing the following line inside a loop over i and iii even though it doesn’t depend on them and it allocates.
vc[ii] += β*v[ii,:,T-t+1]'*pstatevec
This also does a bunch of vector-vector multiplications. Since you are eventually going to use all of the vectors in v you can do a single matrix-vector multiplication instead.

I would make that loop look more like this:

for t=0:T-1
        for  i=1:2^K, iii=1:2^K
            fill!(vc, 0.0)
            for ii=1:2^K
                    for j=1:K
                        vc[ii] += ChoiceSet[ii,j]*(Π[j,T-t]+Sj[j,T-t]*ChoiceSet[i,j]-(1-ShockSet[iii,j])*999999999999999999999999999)
                    end
                    vc[ii] -= GroupCosts[ii,iii,1] + GroupCosts[ii,iii,2]
                    if t>0
                        vc[ii] += v_previous[ii]
                    end
            end
            #println(vc)
            tempind = argmax(vc)
            Indices[i,iii,T-t] = tempind
            v[i,iii]=vc[tempind]
        end
        mul!(v_previous, v, pstatevec)
        v_previous .*= β
    end

Note that I dropped the third dimension from v and preallocated v_previous as a vector.

Vasily_Pisarev · December 13, 2019, 5:35pm

These lines in myfun1 are suspicious:

    X1c = zeros(Int8,size(D,1))
    X2c = zeros(Int8,size(D,1))
    X1l = zeros(Int8,size(D,1))
    X2l = zeros(Int8,size(D,1))
...
    X1c = Φ1.*(X1c.>0)./countc
    X1l = Φ2.*(X1l.>0)./countl

The last two lines compute arrays of floats in the right side. Hence, bye-bye type stability.
Maybe rewrite the following sums as

    fcc = sum(a * (b > 0) / c for (a,b,c) in zip(Φ1, X1c, countc))
    fcl = sum(a * (b > 0) / c for (a,b,c) in zip(Φ2, X1l, countl))

In shocks2ind: shockpath[shockpath.==2] .= 0 allocates an array for shockpath.==2. Better:

replace!(shockpath, 2=>0)

bcmichael · December 13, 2019, 6:06pm

It also helps to explicitly hoist some of the computations up in the loops. For example:

for  i=1:2^K, iii=1:2^K
    for j=1:K
        temp[j] = (Π[j,T-t]+Sj[j,T-t]*ChoiceSet[i,j]-(1-ShockSet[iii,j])*999999999999999999999999999)
    end
    fill!(vc, 0.0)
    for ii=1:2^K
            for j=1:K
                vc[ii] += ChoiceSet[ii,j]*temp[j]
            end

Also remember that @inbounds can speed up some of your loops.

jmcastro2109 · December 13, 2019, 6:45pm

Hey! Thanks so much. This greatly improved the codes performance.

jmcastro2109 · December 13, 2019, 6:46pm

Thanks so much for the heads up.

jmcastro2109 · December 13, 2019, 7:25pm

What is the potential drawback of using @views ?

DNF · December 13, 2019, 7:31pm

This is not important for performance, but this

should really be

Dᵢ = zeros(Int, J) 
feat1 = rand(1:8, J)

Especially using rand to create a vector of zeros is pretty non-standard.

Oscar_Smith · December 13, 2019, 7:34pm

views of non-contiguous memory can be slower than copying if used a lot. Also in previous versions (0.6 and earlier) the performance of views were significantly worse, which is the main reason copying is the default.

zweiglimmergneis · December 14, 2019, 4:45pm

The explicit loop is twice as fast on my machine …

sval = 0.0
for i in eachindex(countc)
       sval += Φ1[i] * (X1c[i] > 0) / countc[i]
end

bcmichael · December 14, 2019, 11:30pm

I took another look at this and it occurred to me that the core thing eating up a lot of time in your main loop is actually basically matrix multiplication, but being done with a naive nested loop. If you rearrange it to actually call an optimized matrix multiplication routine you can get a big speedup.

v_previous = zeros(Float64,2^K)
temp = zeros(Float64, K, 2^K)
temp1 = similar(GroupCosts)
temp2 = similar(temp1)
Choice = Float64.(ChoiceSet)
@inbounds for t=0:T-1
    temp1 .= v_previous.-GroupCosts
    for  i=1:2^K
        for iii=1:2^K
            for j=1:K
                temp[j,iii] = (Π[j,T-t]+Sj[j,T-t]*ChoiceSet[i,j]-(1-ShockSet[iii,j])*999999999999999999999999999)
            end
        end
        mul!(temp2, Choice, temp)
        for iii=1:2^K
            best = -Inf
            tempind = 1
            for ii=1:2^K
                vc = temp2[ii,iii]+temp1[ii,iii]
                if vc > best
                    best = vc
                    tempind = ii
                end
            end
            Indices[i,iii,T-t] = tempind
            v[i,iii]=best
        end
    end
    mul!(v_previous, v, pstatevec)
    v_previous .*= β
end

I also got rid of argmax and did that explicitly in the loop, because it gave a meaningful speedup.
Edit: oops I forgot to mention that I made GroupCosts a 2D array holding the sum of the things you had stored along the 3rd dimension since you never used them independently.

jmcastro2109 · January 10, 2020, 2:11pm

Hello!

Do you think there can be potential gains from using the sparsity of these arrays somehow in this code, given the fact that the outcomes are going to be mostly zeros.

@bcmichael

bcmichael · January 10, 2020, 5:50pm

The only way to know for sure is to try it and see what the timings look like, but my initial guess is probably not. The great majority of the time seems to be spent in the matrix multiplication and finding the maximum values in each column of the result of the matrix multiplication. Your matrices really aren’t that sparse, so the multiplication would probably get slower if you tried to use a sparse matrix representation. The result of the matrix multiplication isn’t sparse basically at all, so a sparse representation can’t really help make finding the maximums faster.

The only thing I can really think of to help speed things up is trying to take advantage of the parallel nature of what you’re doing. The matrix multiplication should already be using multiple threads under the hood, but you could try to use multi-threading for finding the maximum values in the columns. Alternatively this would probably be a decent candidate for kicking the work over to a GPU if you have one.

Topic		Replies	Views
Speeding Up a Backward Induction Code Performance question	30	1271	September 11, 2019
Need help with performance in a very large loop Performance performance	28	1371	May 28, 2020
Pushing a function to its performance limit Performance	16	789	March 9, 2020
Speeding up maximization Performance memory-allocation	4	673	January 24, 2022
Help with Performance in Large Mixed Complementarity Problem Optimization (Mathematical) performance , complementarity	23	599	January 31, 2024

Performance Tips for Combinatorial Problem

Related topics