Help with bottleneck in dynamic programming

jmatheusgpa · January 8, 2021, 9:52pm

My goal is to calculate all the possible outcomes in a finite dynamic programming problem. The grid size is 100, there are three state variables and T periods.

A bottleneck is the following function, which I’ve started to improve in version 2. According to profiling, the issue is in line 28. Something about abstractarray.jl and multidimensional.jl. I’ve read that this is expected when indexing arrays. But I don’t know what can be done to speed up the code in spite of it. I’ve tried using inbounds and views, but these didn’t help much.

Also, according to code_warntype, Julia seems to have problems inferring the data types in lines 25 and 28.

Would you please have a look at it and tell me how I can improve it? Any help is appreciated.

Thanks!

using LinearAlgebra, Statistics


using BenchmarkTools

#packages to increase the speed of the code
using Profile # @profile command
using ProfileView #profiling using graphs


#number of periods
T = 13

#grid size
grid = 100


#continuation values
#period t (not t+1), θkt, ℓkt, ℓmt
s_W = zeros(T+1, grid, grid, grid)
j_W = similar(s_W)

#optimal payments
#period, θkt, ℓmt, outputs
Pst_array = zeros(T, grid, grid, 6) #we won't need payment proposal for period T, a.k.a. "T+1"
Pjt_array = similar(Pst_array)


#probability mass function
pmf = Array{Float64}(undef,grid,grid)

#example
pmf[:,:] .= 0.01

t, hkt, lkt, lmt, Pmt_array, k_W = T, 50, 50, 50, Pjt_array, s_W

continuation_value = 0.011
payment_threshold = 91

s_W[T+1, :, :, :] .= continuation_value
Pjt_array[t, payment_threshold:end, lkt, 1] .= continuation_value

#function to calculate the payment offer threshold ϕmt

#test with searchsorted first

function threshold_m(t, lkt, lmt, Pmt_array)
    
    #searches the index where the payment offer is positive
    ts = @views searchsortedfirst(Pmt_array[t, :, lkt, 1], 0.01)
    
    
    #returns error message if threshold does not exist (101 is returned if missing by searchsortedfirst)
    @assert ts < 101
    
    return ts
    
end

#test 
# t, lkt,lmt, Pmt_array = T, 50, 51, Pjt_array
# @btime threshold_m(t, lkt,lmt,  Pmt_array)
# @code_warntype threshold_m(t, lkt,lmt,  Pmt_array)

function respond_paymentv1(t, hkt, lkt, lmt, Pmt_array, k_W)

    ts = threshold_m(t, lkt, lmt, Pmt_array)

    lmt = max(ts, lmt)


    #the possible values of payment given by the opponent
    Pmt_grid = Pmt_array[t, lmt:end, lkt, 1]
    
    
    #the screening cutoffs associated with each payment proposal, simplified
    cutoffs = Array{Int64}(undef, length(Pmt_grid))
    cutoffs .= lkt

    

    #A1### 

    #payoff when accepting the payment is the best response
    pA1 = copy(Pmt_grid)

    #weighting by the probabilities
    probk = pmf[hkt, hkt:end]
    probm = pmf[lmt, lmt:end]

    lowerbounds = collect(lmt:1:grid)
    
    
    #K and M to make the shape of the arrays easier to see
    K, M = length(probk), length(probm)


    pA2 = Array{Float64}(undef, K, M)
    for i in eachindex(cutoffs)
        pA2[:, i] .= (k_W[t+1, hkt:end, cutoffs[i], lowerbounds[i]])
    end

    pA2

    #indicator function
    # IA1 = np.where(k_W[t+1, hkt:, cutoffs, np.arange(lmt, grid, step = 1)].T <= Pmt_grid, 1, 0) #python code

    IA1 = Array{Int64}(undef, K, M)
    IA1[findall(pA2 .<= transpose(repeat(Pmt_grid, 1, K)))] .= 1

    #IA1 has S rows and J columns
    #we want a matrix of probabilities with the same dimension
    # prob = np.outer(probk, probm) #python code
#     prob = repeat(probk, 1, M) .* transpose(repeat(probm, 1, K))
    
    prob = probm' .* probk #outer product


    pA1 = transpose(pA1) .* prob

    A1 = pA1 .* IA1

    #A2####

    IA2 = @. ifelse(IA1==1, 0, 1)

    pA2 = pA2 .* prob

    A2 = pA2 .* IA2
    
    payoff = A1 + A2
    payoff = sum(payoff)

    #returns the expected value of answering a payment offer and the probability of liquidation
    return payoff
end

#test1: expected payoff of S
t, hkt, lkt, lmt, Pmt_array, k_W = T, 50, 50, 50, Pjt_array, s_W
@btime respond_paymentv1(t, hkt, lkt, lmt, Pmt_array, k_W)



function respond_paymentv2(t, hkt, lkt, lmt, Pmt_array, k_W)
    
    ts = threshold_m(t, lkt, lmt, Pmt_array)

    lmt = max(ts, lmt)


    #the possible values of payment given by the opponent
    Pmt_grid = Pmt_array[t, lmt:end, lkt, 1]
    
    
    #the screening cutoffs associated with each payment proposal, simplified
    cutoffs = Array{Int64}(undef, length(Pmt_grid))
    cutoffs .= lkt

    payoff = zeros(length(Pmt_grid))
#     payoff = Array{Float64}(undef, length(Pmt_grid))



    for (i, pmt) in enumerate(Pmt_grid)
        #receives pmt if his skill level is below the cutoff
        payoff[i] += pmt * sum(pmf[hkt, hkt:(cutoffs[i]-1)]) * pmf[lmt, lmt+(i-1)]
        
        #receives pmt if his skill level greather than or equal to the cutoff
        payoff[i] += dot(k_W[t+1, cutoffs[i]:end , cutoffs[i], lmt+(i-1)], pmf[hkt, cutoffs[i]:end]) * pmf[lmt, lmt+(i-1)]
        
    end
        
    return sum(payoff)
    
end


#test1: expected payoff of S
t, hkt, lkt, lmt, Pmt_array, k_W = T, 50, 50, 50, Pjt_array, s_W
@btime respond_paymentv2(t, hkt, lkt, lmt, Pmt_array, k_W)

@code_warntype respond_paymentv2(t, hkt, lkt, lmt, Pmt_array, k_W)

using Profile
Profile.clear()
for i in 1:10000
    @profile respond_paymentv2(t, hkt, lkt, lmt, Pmt_array, k_W)
end



ProfileView.view()

PS1: Context
I’m studying a game where two players bargain with each other. I’ll call K the player who is taking the actions and M his adversary.

The gamel has asymmetric information. Each player bargaining power increases on his true skill level, hkt, which only he knows. But there are lower bounds, lkt and lmt, that help the adversary to infer hkt.

The function I’m struggling with calculates the expected payoff when K answers a payment proposal. K receives the payment pmt(hmt) in case he accepts it and k_W(hkt+1) if he rejects it. So he chooses the maximum between the two options. There are different possible values for hmt and for hkt+1, so we use the pmf (probability mass function) to calculate the expected payoff.

There is a cutoff level which is the minimum level of hkt+1 necessary to be lucrative to reject a payment offer. I’ve used this in version 2 to speed up the code.

PS2: If you have interest, the complete code is here. Again, tips are welcome

Oscar_Smith · January 8, 2021, 9:59pm

Most of the problem is cutoffs = Array{Int64}(undef, length(Pmt_grid))
Array{Int64} is an abstract type. You presumably meant Vector{Int64}. Note however, that this line and the following can be replaced with
cutoffs = fill(lkt, length(Pmt_grid))

Similarly, you want `pmf = fill(.01, grid, grid)

Also, indexing with a range makes a copy. (ie pmf[hkt, hkt:(cutoffs[i]-1)]) To fix this, you can put @views before the for loop.

jmatheusgpa · January 11, 2021, 7:25pm

Hi Oscar, thanks for the tips.

I followed your suggestions and the function takes now 4.857 microseconds, down from 6.720 microseconds.

I put @view before each array indexed with range because it was faster than putting @views before the for loop.

Also, thanks for the tip on using fill, but these values are just an example. In the real code, each cell of cutoffs and pmf has a different value.

Version 3 of the code is:



function respond_paymentv3(t, hkt, lkt, lmt, Pmt_array, k_W)
    
    ts = threshold_m(t, lkt, lmt, Pmt_array)

    lmt = max(ts, lmt)


    #the possible values of payment given by the opponent
    Pmt_grid = Pmt_array[t, lmt:end, lkt, 1]
    
    
    #the screening cutoffs associated with each payment proposal, simplified
    cutoffs = Vector{Int64}(undef, length(Pmt_grid))
    cutoffs .= lkt

    payoff = zeros(length(Pmt_grid))
#     payoff = Array{Float64}(undef, length(Pmt_grid))



    for (i, pmt) in enumerate(Pmt_grid)
        #receives pmt if his skill level is below the cutoff
        payoff[i] += pmt * sum(@view(pmf[hkt, hkt:(cutoffs[i]-1)])) * pmf[lmt, lmt+(i-1)]
        
        #receives pmt if his skill level greather than or equal to the cutoff
        payoff[i] += dot(@view(k_W[t+1, cutoffs[i]:end , cutoffs[i], lmt+(i-1)]), @view(pmf[hkt, cutoffs[i]:end])) * pmf[lmt, lmt+(i-1)]
        
    end
        
    return sum(payoff)
    
end


#test1: expected payoff of S
t, hkt, lkt, lmt, Pmt_array, k_W = T, 50, 50, 50, Pjt_array, s_W
@btime respond_paymentv3(t, hkt, lkt, lmt, Pmt_array, k_W)

Now I’m working on two issues that came up in this newer version, in case anybody can help:

@code_warntype show possible issues with SubArrays, which I thinks is due to the use of @view:

@code_warntype respond_paymentv3(t, hkt, lkt, lmt, Pmt_array, k_W)
Variables
  #self#::Core.Compiler.Const(respond_paymentv3, false)
  t::Int64
  hkt::Int64
  lkt::Int64
  lmt@_5::Int64
  Pmt_array::Array{Float64,4}
  k_W::Array{Float64,4}
  ts::Int64
  Pmt_grid::Array{Float64,1}
  cutoffs::Array{Int64,1}
  payoff::Array{Float64,1}
  @_12::Union{Nothing, Tuple{Tuple{Int64,Float64},Tuple{Int64,Int64}}}
  i::Int64
  pmt::Float64
  @_15::Int64
  lmt@_16::Int64
  @_17::SubArray
  @_18::SubArray{Float64,1,Array{Float64,4},Tuple{Int64,UnitRange{Int64},Int64,Int64},true}
  @_19::SubArray

Body::Float64
1 ──       (lmt@_16 = lmt@_5)
│          (ts = Main.threshold_m(t, lkt, lmt@_16, Pmt_array))
│          (lmt@_16 = Main.max(ts, lmt@_16))
│    %4  = lmt@_16::Int64
│    %5  = Base.lastindex(Pmt_array, 2)::Int64
│    %6  = (%4:%5)::UnitRange{Int64}
│          (Pmt_grid = Base.getindex(Pmt_array, t, %6, lkt, 1))
│    %8  = Core.apply_type(Main.Vector, Main.Int64)::Core.Compiler.Const(Array{Int64,1}, false)
│    %9  = Main.length(Pmt_grid)::Int64
│          (cutoffs = (%8)(Main.undef, %9))
│    %11 = cutoffs::Array{Int64,1}
│    %12 = Base.broadcasted(Base.identity, lkt)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(identity),Tuple{Int64}}
│          Base.materialize!(%11, %12)
│    %14 = Main.length(Pmt_grid)::Int64
│          (payoff = Main.zeros(%14))
│    %16 = Main.enumerate(Pmt_grid)::Base.Iterators.Enumerate{Array{Float64,1}}
│          (@_12 = Base.iterate(%16))
│    %18 = (@_12 === nothing)::Bool
│    %19 = Base.not_int(%18)::Bool
└───       goto #10 if not %19
2 ┄─ %21 = @_12::Tuple{Tuple{Int64,Float64},Tuple{Int64,Int64}}::Tuple{Tuple{Int64,Float64},Tuple{Int64,Int64}}
│    %22 = Core.getfield(%21, 1)::Tuple{Int64,Float64}
│    %23 = Base.indexed_iterate(%22, 1)::Core.Compiler.PartialStruct(Tuple{Int64,Int64}, Any[Int64, Core.Compiler.Const(2, false)])
│          (i = Core.getfield(%23, 1))
│          (@_15 = Core.getfield(%23, 2))
│    %26 = Base.indexed_iterate(%22, 2, @_15::Core.Compiler.Const(2, false))::Core.Compiler.PartialStruct(Tuple{Float64,Int64}, Any[Float64, Core.Compiler.Const(3, false)])
│          (pmt = Core.getfield(%26, 1))
│    %28 = Core.getfield(%21, 2)::Tuple{Int64,Int64}
│    %29 = Base.getindex(payoff, i)::Float64
│    %30 = pmt::Float64
│          true
│    %32 = Base.getindex(cutoffs, i)::Int64
│    %33 = (%32 - 1)::Int64
│    %34 = (hkt:%33)::UnitRange{Int64}
│          (@_17 = (view)(Main.pmf, hkt, %34))
└───       goto #4
3 ──       Core.Compiler.Const(:(@_17 = false), false)
4 ┄─ %38 = @_17::SubArray
│    %39 = Main.sum(%38)::Any
│    %40 = lmt@_16::Int64
│    %41 = lmt@_16::Int64
│    %42 = (i - 1)::Int64
│    %43 = (%41 + %42)::Int64
│    %44 = Base.getindex(Main.pmf, %40, %43)::Any
│    %45 = (%30 * %39 * %44)::Any
│    %46 = (%29 + %45)::Any
│          Base.setindex!(payoff, %46, i)
│    %48 = Base.getindex(payoff, i)::Float64
│          true
│    %50 = (t + 1)::Int64
│    %51 = Base.getindex(cutoffs, i)::Int64
│    %52 = (lastindex)(k_W, 2)::Int64
│    %53 = (%51:%52)::UnitRange{Int64}
│    %54 = Base.getindex(cutoffs, i)::Int64
│    %55 = lmt@_16::Int64
│    %56 = (i - 1)::Int64
│    %57 = (%55 + %56)::Int64
│          (@_18 = (view)(k_W, %50, %53, %54, %57))
└───       goto #6
5 ──       Core.Compiler.Const(:(@_18 = false), false)
6 ┄─ %61 = @_18::SubArray{Float64,1,Array{Float64,4},Tuple{Int64,UnitRange{Int64},Int64,Int64},true}
│          true
│    %63 = Base.getindex(cutoffs, i)::Int64
│    %64 = (lastindex)(Main.pmf, 2)::Any
│    %65 = (%63:%64)::Any
│          (@_19 = (view)(Main.pmf, hkt, %65))
└───       goto #8
7 ──       Core.Compiler.Const(:(@_19 = false), false)
8 ┄─ %69 = @_19::SubArray
│    %70 = Main.dot(%61, %69)::Any
│    %71 = lmt@_16::Int64
│    %72 = lmt@_16::Int64
│    %73 = (i - 1)::Int64
│    %74 = (%72 + %73)::Int64
│    %75 = Base.getindex(Main.pmf, %71, %74)::Any
│    %76 = (%70 * %75)::Any
│    %77 = (%48 + %76)::Any
│          Base.setindex!(payoff, %77, i)
│          (@_12 = Base.iterate(%16, %28))
│    %80 = (@_12 === nothing)::Bool
│    %81 = Base.not_int(%80)::Bool
└───       goto #10 if not %81
9 ──       goto #2
10 ┄ %84 = Main.sum(payoff)::Float64
└───       return %84

@_17 is @view(pmf[hkt, hkt:(cutoffs[i]-1)]) and @_19 is @view(pmf[hkt, cutoffs[i]:end]). So the problem is creating views of pmf. I’m searching how to fix it.
I’ve tried to change pmf to pmf = fill(0.01, grid, grid) and pmf = zeros(grid, grid), but it didn’t work.

I noted that I didn’t need to create an array for payoff. It is just an escalar, so I created the following version of the code, which is slightly faster (4.2 microseconds). The problem is that Julia cannot infer that the type of payoff is Float64:



function respond_paymentv4(t, hkt, lkt, lmt, Pmt_array, k_W)
    
    ts = threshold_m(t, lkt, lmt, Pmt_array)

    lmt = max(ts, lmt)


    #the possible values of payment given by the opponent
    Pmt_grid = Pmt_array[t, lmt:end, lkt, 1]
    
    
    #the screening cutoffs associated with each payment proposal, simplified
    cutoffs = Vector{Int64}(undef, length(Pmt_grid))
    cutoffs .= lkt

    payoff = Float64(0.0)
#     payoff = Array{Float64}(undef, length(Pmt_grid))



    for (i, pmt) in enumerate(Pmt_grid)
        #receives pmt if his skill level is below the cutoff
        payoff += pmt * sum(@view(pmf[hkt, hkt:(cutoffs[i]-1)])) * pmf[lmt, lmt+(i-1)]
        
        #receives pmt if his skill level greather than or equal to the cutoff
        payoff += dot(@view(k_W[t+1, cutoffs[i]:end , cutoffs[i], lmt+(i-1)]), @view(pmf[hkt, cutoffs[i]:end])) * pmf[lmt, lmt+(i-1)]
        
    end
        
    return payoff
    
end


#test1: expected payoff of S
t, hkt, lkt, lmt, Pmt_array, k_W = T, 50, 50, 50, Pjt_array, s_W
@btime respond_paymentv4(t, hkt, lkt, lmt, Pmt_array, k_W)

@code_warntype respond_paymentv4(t, hkt, lkt, lmt, Pmt_array, k_W)
payoff::Any

mikkoku · January 11, 2021, 9:17pm

In the output of @code_warntype there is

That is you are accessing pmf as a global variable. You should pass it as a parameter to respond_payment.

jmatheusgpa · January 12, 2021, 6:14pm

Thanks, @mikkoku! This made the function 4x faster.

Also, @code_warntype shows that now Julia can infer the types of all variables. So I think it’s everything that can be done to speed up the function.

Latest version:

function respond_paymentv5(t, hkt, lkt, lmt, Pmt_array, k_W, pmf)
    
    ts = threshold_m(t, lkt, lmt, Pmt_array)

    lmt = max(ts, lmt)


    #the possible values of payment given by the opponent
    Pmt_grid = Pmt_array[t, lmt:end, lkt, 1]
    
    
    #the screening cutoffs associated with each payment proposal, simplified
    cutoffs = Vector{Int64}(undef, length(Pmt_grid))
    cutoffs .= lkt

    payoff = Float64(0.0)
#     payoff = Array{Float64}(undef, length(Pmt_grid))



    for (i, pmt) in enumerate(Pmt_grid)
        #receives pmt if his skill level is below the cutoff
        payoff += pmt * sum(@view(pmf[hkt, hkt:(cutoffs[i]-1)])) * pmf[lmt, lmt+(i-1)]
        
        #receives pmt if his skill level greather than or equal to the cutoff
        payoff += dot(@view(k_W[t+1, cutoffs[i]:end , cutoffs[i], lmt+(i-1)]), @view(pmf[hkt, cutoffs[i]:end])) * pmf[lmt, lmt+(i-1)]
        
    end
        
    return payoff
    
end


#test1: expected payoff of S
t, hkt, lkt, lmt, Pmt_array, k_W, pmf = T, 50, 50, 50, Pjt_array, s_W, pmf
@btime respond_paymentv5(t, hkt, lkt, lmt, Pmt_array, k_W, pmf)

 973.267 ns (3 allocations: 336 bytes)

Summary:

@btime respond_paymentv2(t, hkt, lkt, lmt, Pmt_array, k_W)
 6.800 μs (174 allocations: 13.45 KiB)

@btime respond_paymentv3(t, hkt, lkt, lmt, Pmt_array, k_W)
 4.914 μs (174 allocations: 5.02 KiB)

@btime respond_paymentv4(t, hkt, lkt, lmt, Pmt_array, k_W)
4.414 μs (152 allocations: 4.53 KiB)

@btime respond_paymentv5(t, hkt, lkt, lmt, Pmt_array, k_W, pmf)
973.267 ns (3 allocations: 336 bytes)

From v2 to v5, I made the following changes:

created cutoffs as Vector{Int64} instead of Array{Int64} to help Julia to infer the data types
created payoffs as an escalar instead of an array in order to avoid using sum() unnecessarily
put @view before arrays indexed with ranges, i.e. @view(pmf[hkt, hkt:(cutoffs[i]-1)] , to avoid copies of the arrays
passed pmf as an argument to the function to avoid global variables

Thank you again, @Oscar_Smith and @mikkoku

Topic		Replies	Views
Optimize a simple code for performance General Usage	3	673	January 20, 2019
Function control flow Performance question , performance	22	1496	May 31, 2020
Performance input on my code Performance	8	837	October 15, 2018
Help me optimize dynamic programming code General Usage	7	1600	October 12, 2019
Optimizing code with array assignments Performance	12	415	June 15, 2023

Help with bottleneck in dynamic programming

Related topics