Improving performance of dynamic programming problem of households in a labor market model

Greetings:

I am working on solving a heterogeneous-agent economics model in Julia. The most demanding part is solving the household problem using dynamic programming. Households can be either employed or unemployed. The state space of unemployed households consists of just their assets. For employed households, the state space consists of productivity and the wage submarket in addition to their asset position. Productivity evolves according to a Markov process between periods of unemployment whereas the wage submarket is fixed for the duration of the job.

Both unemployed and employed households make a savings decision. However, unemployed households also which wage submarket to search for. The tradeoff to searching in a higher wage submarket is a lower job finding rate.

Since individuals can transition between employed and unemployed, the value function of being employed involves that of being unemployed and vice versa. Accordingly, we iterate on both value functions simultaneously.

The demanding part of the algorithm is the use of numerical maximization for each point of the state space. The key function which implements the dynamic programming is solve_household which calls on the functions update_value and update_value_un. The problem is that, currently, it takes several minutes to obtain convergence of the household problem. Solving general equilibrium implies solving the household problem several times (at different possible interest rates), so this is not good enough.

I checked type stability using @code_warntype, which seems to be fine. But I suspect I am nevertheless suspect I am committing some major performance goof as comparable–at least as far as I can tell–Fortran code runs much faster.

I attach self-contained code below. The initial part of the code consists of an initialization and solving the firm problem, which is irrelevant to the performance issues in the household problem. I welcome all suggestions that would improve performance.

Many thanks.


using PyPlot, BenchmarkTools
using PyFormattedStrings
using LaTeXStrings
using Parameters, CSV, Random
using Dierckx, Distributions, ArgParse
using Optim, Interpolations

function grid_Cons_Grow(n, left, right, g)
    """
    Creates n+1 gridpoints with growing distance on interval [a, b]
    according to the formula
    x_i = a + (b-a)/((1+g)^n-1)*((1+g)^i-1) for i=0,1,...n 
    """
    x = zeros(n)
    for i in 0:(n-1)
        x[i+1] = @. left + (right-left)/((1+g)^n-1)*((1+g)^i-1)
    end
    return x
end

Para = @with_kw ( Ξ³=0.5,
    egam= 1.0-1.0/Ξ³,
    Ξ² = 0.96,
    Ξ± = 0.2,
    A_p = 1.0,
    Ξ΄= 0.0,
    ρ = 0.9,
    Οƒ_eps = 0.04*(1.0-ρ^2),
    NS= 9,
    a_l= 0.0,
    a_u= 50.0,
    NA=50,
    a = grid_Cons_Grow(NA, a_l, a_u, 0.01),
    mc= rouwenhorst(NS, ρ, Οƒ_eps^(1/2), 0),
    P = mc.p,
    z = exp.(mc.state_values),
    Nomega = 14,
    omega_grid = range(0.1, stop=0.9, length=Nomega),
    a_bor =-0.0,
    Tr = 0.02,
    h_un = 0.02,
    Pi = stationary_distributions(mc)[1],
    sep = 0.05,
    ΞΊ = 2.0,
    A_m = 0.54,
    Ξ·_L = 0.72
    )

   
    # Create asset grid
    

    #u::Function = (c, Ξ³=Ξ³) -> c^(1-Ξ³)/(1-Ξ³)
    #u_prime::Function = c -> c^(-Ξ³)
    #u_prime_inv::Function = c -> c^(-1/Ξ³)
   

function matching_functions(para)
@unpack A_m, Ξ·_L = para
  # Matching functions
  jf(ΞΈ) = min(A_m*ΞΈ^(1-Ξ·_L), 1.0)
  q(ΞΈ) = min(A_m*ΞΈ^Ξ·_L, 1.0)
  return jf, q
end

function initialize(para)
    @unpack γ, egam, α, β, A_p, δ, ρ, NS, NA, Nomega, z, a, a_bor, Tr, h_un = para
    minrate = -Ξ΄
    maxrate = (1-Ξ²)/Ξ²
    r = 0.8*(minrate+maxrate)
    k_opt = @. ((r+Ξ΄)/(z*A_p*Ξ±))^(Ξ±-1)
    c_pol = zeros(NA, NS, Nomega)
    for is in 1:NS
        for iom in 1:Nomega
            c_pol[:, is, iom] = @. max(r*(a + a_bor),1e-10)
        end
    end

    V = c_pol.^(egam)/egam
    c_pol_un = @. max(r*(a+a_bor), 1e-10)
    V_un = c_pol_un.^(egam)/egam

    # Initial guess of distribution function
    phi = 1.0/(NA*NS*Nomega*2)
    return r, k_opt, c_pol, c_pol_un, V, V_un
end


function bellman_firm(r, para;tol=1e-7)
    #=
    Value of firm at productivity grid (is, iom)
    =#
    @unpack omega_grid, z, NS, Nomega, Ξ΄, Ξ±, A_p, sep, P = para
    J_new = zeros(NS, Nomega)
    diff = 1.0
    J = zero(J_new)
    profit = zero(J_new)
    while diff > tol
        for (iom, Ο‰) in enumerate(omega_grid)
            for (is, z_i) in enumerate(z)
                k_opt = ((r+Ξ΄)/(z_i*A_p*Ξ±))^(Ξ±-1)
                profit[is, iom] = (1-Ο‰)* (z_i*A_p*k_opt^Ξ±-(r+Ξ΄)*k_opt)
                for is_p in 1:NS
                    J[is, iom] += (profit[is, iom] + (1.0-sep)/(1+r)*J_new[is_p, iom])*P[is, is_p]
                end
            end
        end
        diff = maximum(abs.(J-J_new))
        #print(diff)
        J_new .= J
    end
    return J, profit
end


function ΞΈ_fun(J, r, para)
    # Returns mapping between Ο‰ and ΞΈ
    @unpack NS, Nomega, A_p, ΞΊ, Ξ·_L, Pi = para
    # Expected firm value from stationary distribution
    EJ = zeros(Nomega)
    for iom in 1:Nomega
        EJ[iom] = sum(J[:, iom].*Pi)
    end
    EJ .= @. max(EJ, 1e-10)
    ΞΈ = @. (A_p*EJ/(ΞΊ*(1+r)))^(1/Ξ·_L)
    return ΞΈ
end

#@code_warntype(ΞΈ_fun(J, 0.02, para))
#@btime ΞΈ_fun(J, 0.02, para)

function bellman(a_prime, ia, is, iom, V_fun, V_un_fun, r, para)
    
    # Bellman equation of employed HH with state (a,z,Ο‰) and guess of
    # savings a_prime
    @unpack Ξ΄, Ξ², A_p, Ξ±, Tr, a, omega_grid, P, sep, egam, z, a_bor, NS =para
    a_i, z_i, Ο‰ = a[ia], z[is], omega_grid[iom]

    k_opt = ((r+Ξ΄)/(z_i*A_p*Ξ±))^(Ξ±-1)
    revenue = z_i*A_p*k_opt^Ξ±-(r+Ξ΄)*k_opt
    wage = Ο‰*revenue
    c = max((a_i + a_bor)*(1+r)+wage-(a_prime+a_bor)-Tr, 1e-10)
    expect = Ξ²*sep*max(V_un_fun(a_prime)^(egam)/egam, 1e-10)
    for is_p in 1:NS
        expect += Ξ²*(1-sep)*max(V_fun(a_prime, is_p, iom), 1e-10)^(egam)/egam*P[is, is_p]
    end
    util = (c^egam/egam + expect)
    return util
end

#@code_warntype(bellman(5.0, 3, 2, 5, V_fun, V_un_fun, 0.02, para))
#@btime (bellman(5.0, 3, 2, 5, $V_fun, $V_un_fun, 0.02, $para))

function bellman_un(a_prime, ia, iom, V_fun, V_un_fun, jf_val, r, para)
    # Bellman equation of unemployed HH with state (a) and guess of
    # savings a_prime
    @unpack Ξ΄, Ξ², A_p, Ξ±, Tr, a, omega_grid, Pi, h_un, a_bor, NS, egam =para
    jf = jf_val[iom]
    a_i = a[ia]
    # consumption
    c = max(((a_i+a_bor)*(1+r)-(a_prime+a_bor)+h_un), 1e-10)
    expect = Ξ²*(1.0-jf)*max(V_un_fun(a_prime), 1e-10)^(egam)/egam 
    @inbounds for is_p in 1:NS
        expect += Ξ²*jf*max(V_fun(a_prime, is_p, iom), 1e-10)^(egam)/egam*Pi[is_p]
    end
    util = c^egam/egam + expect
    return util
end



function update_value_un(V_fun, V_un_fun, jf_vals, r, para)
    # Unemployed 
    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_u, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para
    a_pol_un = zeros(NA)
    c_pol_un = zeros(NA)
    V_un_new = zeros(NA)
    omega_policy = zeros(Int, NA)
    temp_fun = zeros(NA, Nomega)
    temp_maximizer = zeros(NA, Nomega)
    @inbounds Threads.@threads for ia in 1:NA
        a_i = a[ia]
        for iom = 1:Nomega 
            # get a_prime
            #a_u = max((a[ia] + a_bor)*(1+r) - c_pol_un[ia] + h_un, 1e-10)
            a_u = max((a_i + a_bor)*(1+r) + h_un, 1e-10)
            objective = x -> bellman_un(x, ia, iom, V_fun, V_un_fun, jf_vals, r, para)
            sol = maximize(objective, a_l, a_u)
            temp_fun[ia, iom] = Optim.maximum(sol)
            temp_maximizer[ia, iom] = Optim.maximizer(sol)
        end
        # Ο‰ which maximizes value
        iom_max = argmax(@view(temp_fun[ia, :]))
        # policies and value function
        a_pol_un[ia] = temp_maximizer[ia, iom_max]
        c_pol_un[ia]     =max((a_i+a_bor)*(1+r) +h_un - (a_pol_un[ia] +a_bor)  , (1e-10))
        omega_policy[ia] =iom_max
        V_un_new[ia] = temp_fun[ia, iom_max]
    end
    return V_un_new, a_pol_un, c_pol_un, omega_policy
end

function update_value(V_fun, V_un_fun, r, para)
    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_u, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para
    a_pol = zeros(NA, NS, Nomega)
    c_pol = zeros(NA, NS, Nomega)
    V_new = zeros(NA, NS, Nomega)
    k_opt = @. ((r+Ξ΄)/(z*A_p*Ξ±))^(Ξ±-1)
    # Interpolate employed value function
    @inbounds Threads.@threads for ia in 1:NA
        a_i = a[ia]
        for iom in 1:Nomega
            for is in 1:NS
                z_i = z[is]
                wage = omega_grid[iom]*(z_i*A_p*k_opt[is]^Ξ±-(r+Ξ΄)*k_opt[is])
                #x_in = max((a[ia] + a_bor)*(1+r) + wage - c_pol[ia, is, iom] - Tr, 1e-10)
                a_u = max((a_i + a_bor)*(1+r) + wage - Tr, 1e-10)
                objective = x -> bellman(x, ia, is, iom, V_fun, V_un_fun, r, para)
                #sol = maximize(objective, a_l, a_u)
                sol = maximize(objective, a_l, a_u)
                x_in = Optim.maximizer(sol)
                a_pol[ia, is, iom] = max(x_in, 0.0)
                c_pol[ia, is, iom] = max((a_i+a_bor)*(1+r)+wage-(
                    a_pol[ia, is, iom] + a_bor) - Tr, 1e-10)
                V_new[ia, is, iom] = Optim.maximum(sol)
            end
        end
    end
    return  V_new, a_pol, c_pol
end

function solve_household(V, V_un, jf_vals, r, para; itermax=5000, error_tol=1e-9, N=10)
    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_u, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para
    #initialize policies
    a_pol_un = zeros(NA)
    c_pol_un = zeros(NA)
    V_un_new = similar(V_un)
    omega_policy = zeros(Int, NA)
    a_pol = zeros(NA, NS, Nomega)
    c_pol = zeros(NA, NS, Nomega)
    V_new = similar(V)
    # Outer loop for value function iteration
    err = 1.0
    iter_num = 1
    #V_fun(x, is, iom) = LinearInterpolation(a, egam.*V[:, is, iom].^(1/egam), extrapolation_bc=Line())(x)
    #V_un_fun(x) = LinearInterpolation(a, egam.*V_un.^(1/egam), extrapolation_bc=Line())(x)
    while err > error_tol && iter_num < itermax
        V_un_fun(x) = LinearInterpolation(a, egam.*V_un.^(1/egam), extrapolation_bc=Line())(x)
        V_fun(x, is, iom) = LinearInterpolation(a, egam.*V[:, is, iom].^(1/egam), extrapolation_bc=Line())(x)
        # Optimal decision for every gridpoint
        # given the interest rate
        V_un_new, a_pol_un, c_pol_un, omega_policy = update_value_un(V_fun, V_un_fun, jf_vals, r, para)  
        V_new, a_pol, c_pol = update_value(V_fun, V_un_fun, r, para) 
        # update value function for fixed policies

        for n in 1:N
            V_un_fun(x) = LinearInterpolation(a, egam.*V_un_new.^(1/egam), extrapolation_bc=Line())(x)
            #V_fun(x, is, iom) = LinearInterpolation(a, egam.*V_new[:, is, iom].^(1/egam), extrapolation_bc=Line())(x)
            for ia in 1:NA
                x_u = a_pol_un[ia]
                iom_max = omega_policy[ia]
                V_un_new[ia] = bellman_un(x_u, ia, iom_max, V_fun, V_un_fun, jf_vals, r, para)
                 for is in 1:NS
                     for iom in Nomega
                         x_e = a_pol[ia, is, iom]
                         V_new[ia, is, iom] = bellman(x_e, ia, is, iom, V_fun, V_un_fun, r, para)
                     end
                end
            end
        end
        err_un = maximum(abs.(V_un_new - V_un)./maximum(abs.(V_un)))
        @show err_un
        err_e = maximum(abs.(V_new - V)./maximum(abs.(V)))
        @show err_e
        err = err_un + err_e
        # update unemployed value function
        V_un .= V_un_new
        V .= V_new
        # Interpolate value functions (transformed)
        @show iter_num +=1
    end
    return V, V_un, c_pol_un, c_pol, a_pol_un, a_pol, omega_policy
end

# initialize
const para = Para(NA=50)
@unpack a, z, omega_grid = para
r,  k_opt, c_pol, c_pol_un, V, V_un = initialize(para)
# Firm value by value function iteration: (NS, Nomega)
J, profits = bellman_firm(r, para)
# Tightness function: length Nomega
ΞΈ_vals = ΞΈ_fun(J, r, para)
# Job finding rate
jf = matching_functions(para)[1]
jf_vals = jf.(ΞΈ_vals)
# Solve HH problem: value and policy functions
@code_warntype solve_household(V, V_un, jf_vals, r, para)


V, V_un, c_pol_un, c_pol, a_pol_un, a_pol, omega_policy = solve_household(V, V_un, jf_vals, r, para)
# Assign bins to savings policies


1 Like

Any chance you have the Fortran available? Being able to see both can make it easier to see where the performance difference is.

In which package do I find this?

iiuc, @code_warntype isn’t recursive, so it doesn’t give the full answer you’d get from

using JET
@report_opt solve_household(V, V_un, jf_vals, r, para)
1 Like

Hi, that’s my omission. It’s the QuantEcon package. The rouwenhorst method is an algorithm for approximating an AR(1) process as a Markov chain.

1 Like

Thanks a lot for bringing this to my attention. I could not get report_opt to work, but report_call does work. The output does suggest there are some issues with type stability. I post the output below, but I am not sure how to maintain the red highlights on the key warnings.

Main./(1, Core.getfield(#self#, :egam))))))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\convenience-constructors.jl:9 Interpolations.#LinearInterpolation#52(extrapolation_bc, _3, range, vs)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\convenience-constructors.jl:9 Interpolations.interpolate(Core.tuple(range), vs, Interpolations.Gridded(Interpolations.Linear()))      
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\gridded\gridded.jl:152 Interpolations.interpolate(Interpolations.tweight(A), Interpolations.tcoef(A), knots, A, it)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\gridded\gridded.jl:135 Interpolations.GriddedInterpolation(_, knots, A, it)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\gridded\gridded.jl:40 Interpolations.check_gridded(it, knots, Interpolations.axes(A))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\gridded\gridded.jl:76 Interpolations.allunique(k1)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ set.jl:389 Base.indexed_iterate(x, 2, _6)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ tuple.jl:94 Base.iterate(I, state)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ no matching method found for call signature (Tuple{typeof(iterate), Nothing, Int64}): Base.iterate(I::Nothing, state::Int64)
│││││││││││││││││││││││└───────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\gridded\gridded.jl:83 Interpolations.check_gridded(Interpolations.getrest(itpflag), Base.tail(knots), Base.tail(axs))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ C:\Users\Administrator\.julia\packages\Interpolations\3gTQB\src\gridded\gridded.jl:67 Base.getindex(knots, 1)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ tuple.jl:29 Base.getfield(t, i, $(Expr(:boundscheck)))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ invalid builtin function call: Base.getfield(t::Tuple{}, i::Int64, $(Expr(:boundscheck)))
│││││││││││││││││││││││└───────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ set.jl:389 Base.indexed_iterate(x, 1)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ tuple.jl:89 Base.iterate(I)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ no matching method found for call signature (Tuple{typeof(iterate), Nothing}): Base.iterate(I::Nothing)
│││││││││││││││││││││││└───────────────
β”‚β”Œ @ c:\Users\TJSEM\Dropbox\Documents - Copy\Research\LUS heterogeneous agents\Julia\MWE.jl:274 Main.-(V_new, V)
β”‚β”‚β”Œ @ arraymath.jl:38 Base.promote_shape(A, B)
β”‚β”‚β”‚β”Œ @ indices.jl:169 Base.promote_shape(Base.axes(a), Base.axes(b))
β”‚β”‚β”‚β”‚β”Œ @ indices.jl:178 Base.string("dimensions must match: a has dims ", a, ", b has dims ", b, ", mismatch at ", i)
β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:174 Base.print_to_string(xs...)
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:135 Base.print(s, x)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:35 Base.show(io, x)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ show.jl:1142 Base.show_delim_array(io, t, '(', ',', ')', true)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ show.jl:1109 #self#(io, itr, op, delim, cl, delim_one, 1, Base.typemax(Base.Int))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ show.jl:1122 Base.getindex(y, 1)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ for 1 of union split cases, no matching method found for call signatures (Tuple{typeof(getindex), Nothing, Int64})): Base.getindex(y::Union{Nothing, Tuple{Base.OneTo{Int64}, Int64}}, 1)
││││││││││└────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ show.jl:1123 Base.getindex(y, 2)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ for 1 of union split cases, no matching method found for call signatures (Tuple{typeof(getindex), Nothing, Int64})): Base.getindex(y::Union{Nothing, Tuple{Base.OneTo{Int64}, Int64}}, 2)
││││││││││└────────────────
(Tuple{Array{Float64, 3}, Vector{Float64}, Vector{Float64}, Array{Float64, 3}, Vector{Float64}, Array{Float64, 3}, Vector{Int64}}, 5)

Hello. My collaborator wrote the Fortran code, which is slightly complicated and involves a built-in toolbox from the site used by the authors at https://www.ce-fortran.com/. If truly necessary, I can ask her for permission to post it, but it seemed plausible that I was missing on some idiomatic constructions to make the code performant.

1 Like

What went wrong?

I ran the code to compile the function once, and then used @report_opt. It reports 218 problems, which is too big to fit in Discourse. Here are the last few lines. There are many cases of runtime dispatch, which can be very slow.

β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ ~/tmp/quantecon/example1.jl:221 %218(%217)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: %218::typeof(Optim.maximizer)(%217::Optim.MaximizationWrapper)
││││││└────────────────────────────────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ ~/tmp/quantecon/example1.jl:222 Main.max(%219, 0.0)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Main.max(%219::Any, 0.0)
││││││└────────────────────────────────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ array.jl:905 Base.convert(Float64, %220)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.convert(Float64, %220::Any)
││││││└────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ ~/tmp/quantecon/example1.jl:225 %252(%217)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: %252::typeof(maximum)(%217::Optim.MaximizationWrapper)
││││││└────────────────────────────────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ array.jl:905 Base.convert(Float64, %253)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.convert(Float64, %253::Any)
││││││└────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:271 Base.broadcasted(Main.abs, Main.-(Core.getfield(V_un_new, :contents), V_un))
β”‚β”‚β”Œ @ ~/.julia/packages/SentinelArrays/p1IoM/src/chainedvector.jl:880 SentinelArrays.map(f, A)
β”‚β”‚β”‚β”Œ @ ~/.julia/packages/SentinelArrays/p1IoM/src/chainedvector.jl:652 Base.collect(Base.Generator(#20, Base.getproperty(x, :arrays)))
β”‚β”‚β”‚β”‚β”Œ @ array.jl:701 et = Base.promote_typejoin_union(T)
β”‚β”‚β”‚β”‚β”‚β”Œ @ promotion.jl:170 Base.promote_typejoin_union(Base.getproperty(_, :a))
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ promotion.jl:190 unwrapva(%35)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: unwrapva(%35::Any)
││││││└────────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ promotion.jl:194 Base.typejoin(%57, %59)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.typejoin(%57::Any, %59::Any)
││││││└────────────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ promotion.jl:198 Base.promote_typejoin_union(%50)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.promote_typejoin_union(%50::Any)
││││││└────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:272 Base.println("err_un = ", Base.repr(err_un))
β”‚β”‚β”Œ @ coreio.jl:4 Base.println(Core.tuple(Core.typeassert(Base.stdout, Base.IO)), xs...)
β”‚β”‚β”‚β”Œ @ strings/io.jl:75 Base.print(Core.tuple(io), xs, Core.tuple("\n")...)
β”‚β”‚β”‚β”‚β”Œ @ strings/io.jl:43 Base.lock(io)
β”‚β”‚β”‚β”‚β”‚β”Œ @ stream.jl:282 Base.lock(Base.getproperty(s, :lock))
β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ lock.jl:100 Base.wait(Base.getproperty(rl, :cond_wait))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ condition.jl:123  = Base.wait()
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ task.jl:837 result = Base.try_yieldto(Base.ensure_rescheduled)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ task.jl:774 Base.getproperty(%7, :result)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.getproperty(%7::Task, :result::Symbol)
│││││││││└───────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ task.jl:775 Base.setproperty!(%7, :result, Base.nothing)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.setproperty!(%7::Task, :result::Symbol, Base.nothing)
│││││││││└───────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ task.jl:776 Base.setproperty!(%7, :_isexception, false)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.setproperty!(%7::Task, :_isexception::Symbol, false)
│││││││││└───────────────
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ condition.jl:125 Base.list_deletefirst!(%25, %19)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ runtime dispatch detected: Base.list_deletefirst!(%25::Any, %19::Task)
│││││││└────────────────────
β”‚β”‚β”Œ @ coreio.jl:4 Base.println(%3, %4, %5)
β”‚β”‚β”‚ runtime dispatch detected: Base.println(%3::IO, %4::String, %5::String)
││└───────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:233 V_un_new = Core.Box()
β”‚β”‚ captured variable `V_un_new` detected
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:247 Main.>(%186, error_tol)
β”‚β”‚ runtime dispatch detected: Main.>(%186::Any, error_tol::Float64)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:262 Base.setindex!(%244, %239, %235)
β”‚β”‚ runtime dispatch detected: Base.setindex!(%244::Any, %239::Any, %235::Int64)
│└────────────────────────────────────────────
β”‚β”Œ @ array.jl:905 Base.convert(Float64, %262)
β”‚β”‚ runtime dispatch detected: Base.convert(Float64, %262::Any)
│└────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:271 Main.-(%307, V_un)
β”‚β”‚ runtime dispatch detected: Main.-(%307::Any, V_un::Vector{Float64})
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:271 Base.broadcasted(Main.abs, %308)
β”‚β”‚ runtime dispatch detected: Base.broadcasted(Main.abs, %308::Any)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:271 Base.broadcasted(Main./, %309, %440)
β”‚β”‚ runtime dispatch detected: Base.broadcasted(Main./, %309::Any, %440::Float64)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:271 Base.materialize(%441)
β”‚β”‚ runtime dispatch detected: Base.materialize(%441::Any)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:271 Main.maximum(%442)
β”‚β”‚ runtime dispatch detected: Main.maximum(%442::Any)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:272 Base.repr(%443)
β”‚β”‚ runtime dispatch detected: Base.repr(%443::Any)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:275 Main.+(%443, %1009)
β”‚β”‚ runtime dispatch detected: Main.+(%443::Any, %1009::Float64)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:277 Base.broadcasted(Base.identity, %1036)
β”‚β”‚ runtime dispatch detected: Base.broadcasted(Base.identity, %1036::Any)
│└────────────────────────────────────────────
β”‚β”Œ @ ~/tmp/quantecon/example1.jl:277 Base.materialize!(V_un, %1037)
β”‚β”‚ runtime dispatch detected: Base.materialize!(V_un::Vector{Float64}, %1037::Any)
│└────────────────────────────────────────────

I had a look into the profile of a serial version of your program: it looks pretty to good to me. Only observation: you redefine V_un_fun in solve_household. It seems to be slightly better to rename the second implementation.

One thing I don’t understand: profiler indicates that @unpack in solve_household seems to be computationally pretty expensive (although neither allocations nor dynamic dispatched are occurring).

Thanks for looking into this!

Renaming V_un_fun in the second implementation is straightforward. If the @unpack in solve_household is expensive, I presume this is the case whenever the named tuple para is unpacked? It is unpacked at every iteration on the state space, so that suggests an area for improvement. But what is the best way to avoid this issue? I can either directly define the functions bellman and bellman_un in terms of parameters. Do structs rather than named tuples offer a major advantage?

Yes.

I have really no idea. Maybe I’d check unpacking the parameters manually to see if this really influences performance.

Regarding the parallel version of your program: I only see a speedup of 2x but would expect 6x. Does this reflect your experience? And what are your performance expectations? Is this a convergence problem?

The site references GitHub - fabiankindermann/ce-fortran: The Program and Compiler Database that accompanies the book "Introduction to Computational Economcis using Fortran". Is this relevant? If yes, do you have any details for us where to look?

Only looking quickly at your code, if update_value and update_value_un are hot, they seem to have a lot of unnecessary allocations (zeros). The arrays should be pre-allocated once outside of loops, and you can always fill with zeros if necessary (and only ifβ€”there seem to be cases where you overwrite the array contents and so zero is unneeded). Or if the arrays are not too big, you may be better off with much faster StaticArrays. I have no idea whether any of this matters, I’m just going by the general philosophy that memory is slow, calculations are fast.

Also, it looks like you’re doing maximize within loops, which might eat up a lot of time. It’s possible the choice of algorithm is key, and problem structure could be exploited. For example, maybe you can provide a gradient, perhaps with automatic differentiation. Especially if your Fortran code exploits structure or knows gradients.

2 Likes

They are. Regarding allocations the picture is more complicated: profiler shows significant allocations in β€˜bellman’ triggered by β€˜LinearInterpolation’.

1 Like

Oh, and I believe maximize calls optimize which uses Nelder-Mead by default, so it won’t be very good. Would be much better to supply gradients and select an algorithm if you can.

EDIT: Scratch that. It seems that maximize is used on a linearly interpolated grid, so there’s nothing to exploit here. Although I suspect that instead of Nelder-Mead, you could just repeatedly crawl between grid points, always going to the maximum nearest neighbor.

I checked this: it seems to be more of a profiler restriction.

Greetings.

Regarding the optimization, univariate optimization on a bounded interval defaults to Brent’s method.

It does seem that there are a lot of allocations, but I am puzzled as to why this is. The @code_warntype and @report_call macros don’t display any issues on update_value, for instance. There was a redefinition of the bound a_u being flagged, so I adjusted that.

I also define new linear interpolating functions only when the arrays storing the value functions have been updated, which seems standard. But I am likely missing something…

Hello. There was actually a bug in bellman_firm. The value of J is supposed to reset each iteration–we store J_new across iterations. To prevent any confusion, I attach the modified code. I also include plots of policy functions from the output. They are probably irrelevant to investigating performance issues but do aid in interpretation. I would also like to upload the flame graph but am not sure which file format is appropriate. .svg does not seem possible here.

Thanks!

using PyPlot, BenchmarkTools

using PyFormattedStrings

using LaTeXStrings

using Parameters, CSV, Random, QuantEcon

using Dierckx, Distributions, ArgParse

using Optim, Interpolations

using JET, ProfileView, ProfileSVG

function grid_Cons_Grow(n, left, right, g)

    """

    Creates n+1 gridpoints with growing distance on interval [a, b]

    according to the formula

    x_i = a + (b-a)/((1+g)^n-1)*((1+g)^i-1) for i=0,1,...n

    """

    x = zeros(n)

    for i in 0:(n-1)

        x[i+1] = @. left + (right-left)/((1+g)^n-1)*((1+g)^i-1)

    end

    return x

end

Para = @with_kw ( Ξ³=0.5,

    egam= 1.0-1.0/Ξ³,

    Ξ² = 0.96,

    Ξ± = 0.2,

    A_p = 1.0,

    Ξ΄= 0.0,

    ρ = 0.9,

    Οƒ_eps = 0.04*(1.0-ρ^2),

    NS= 9,

    a_l= 0.0,

    a_u= 50.0,

    NA=50,

    a = grid_Cons_Grow(NA, a_l, a_u, 0.01),

    mc= rouwenhorst(NS, ρ, Οƒ_eps^(1/2), 0),

    P = mc.p,

    z = exp.(mc.state_values),

    Nomega = 14,

    omega_grid = range(0.1, stop=0.9, length=Nomega),

    a_bor =-0.0,

    Tr = 0.02,

    h_un = 0.02,

    Pi = stationary_distributions(mc)[1],

    sep = 0.05,

    ΞΊ = 2.0,

    A_m = 0.54,

    Ξ·_L = 0.72

    )

   

    # Create asset grid

   

    #u::Function = (c, Ξ³=Ξ³) -> c^(1-Ξ³)/(1-Ξ³)

    #u_prime::Function = c -> c^(-Ξ³)

    #u_prime_inv::Function = c -> c^(-1/Ξ³)

   

function matching_functions(para)

@unpack A_m, Ξ·_L = para

  # Matching functions

  jf(ΞΈ) = min(A_m*ΞΈ^(1-Ξ·_L), 1.0)

  q(ΞΈ) = min(A_m*ΞΈ^Ξ·_L, 1.0)

  return jf, q

end

function initialize(para)

    @unpack γ, egam, α, β, A_p, δ, ρ, NS, NA, Nomega, z, a, a_bor, Tr, h_un = para

    minrate = -Ξ΄

    maxrate = (1-Ξ²)/Ξ²

    r = 0.8*(minrate+maxrate)

    k_opt = @. ((r+Ξ΄)/(z*A_p*Ξ±))^(Ξ±-1)

    c_pol = zeros(NA, NS, Nomega)

    for is in 1:NS

        for iom in 1:Nomega

            c_pol[:, is, iom] = @. max(r*(a + a_bor),1e-10)

        end

    end

    V = c_pol.^(egam)/egam

    c_pol_un = @. max(r*(a+a_bor), 1e-10)

    V_un = c_pol_un.^(egam)/egam

    # Initial guess of distribution function

    phi = 1.0/(NA*NS*Nomega*2)

    return r, k_opt, c_pol, c_pol_un, V, V_un

end

function bellman_firm(r, para;tol=1e-7)

    #=

    Value of firm at productivity grid (is, iom)

    =#

    @unpack omega_grid, z, NS, Nomega, Ξ΄, Ξ±, A_p, sep, P = para

    J_new = zeros(NS, Nomega)

    diff = 1.0

    profit = zero(J_new)

    while diff > tol

        J = zero(J_new)

        for (iom, Ο‰) in enumerate(omega_grid)

            for (is, z_i) in enumerate(z)

                k_opt = ((r+Ξ΄)/(z_i*A_p*Ξ±))^(Ξ±-1)

                profit[is, iom] = (1-Ο‰)* (z_i*A_p*k_opt^Ξ±-(r+Ξ΄)*k_opt)

                for is_p in 1:NS

                    J[is, iom] += (profit[is, iom] + (1.0-sep)/(1+r)*J_new[is_p, iom])*P[is, is_p]

                end

            end

        end

        diff = maximum(abs.(J-J_new))

        #print(diff)

        J_new .= J

    end

    return J_new, profit

end

function ΞΈ_fun(J, r, para)

    # Returns mapping between Ο‰ and ΞΈ

    @unpack NS, Nomega, A_p, ΞΊ, Ξ·_L, Pi = para

    # Expected firm value from stationary distribution

    EJ = zeros(Nomega)

    for iom in 1:Nomega

        EJ[iom] = sum(J[:, iom].*Pi)

    end

    EJ .= @. max(EJ, 1e-10)

    ΞΈ = @. (A_p*EJ/(ΞΊ*(1+r)))^(1/Ξ·_L)

    return ΞΈ

end

#@code_warntype(ΞΈ_fun(J, 0.02, para))

#@btime ΞΈ_fun(J, 0.02, para)

function bellman(a_prime, ia, is, iom, V_fun, V_un_fun, r, para)

   

    # Bellman equation of employed HH with state (a,z,Ο‰) and guess of

    # savings a_prime

    @unpack Ξ΄, Ξ², A_p, Ξ±, Tr, a, omega_grid, P, sep, egam, z, a_bor, NS =para

    a_i, z_i, Ο‰ = a[ia], z[is], omega_grid[iom]

    k_opt = ((r+Ξ΄)/(z_i*A_p*Ξ±))^(Ξ±-1)

    revenue = z_i*A_p*k_opt^Ξ±-(r+Ξ΄)*k_opt

    wage = Ο‰*revenue

    c = max((a_i + a_bor)*(1+r)+wage-(a_prime+a_bor)-Tr, 1e-10)

    expect = Ξ²*sep*max(V_un_fun(a_prime)^(egam)/egam, 1e-10)

    for is_p in 1:NS

        expect += Ξ²*(1-sep)*max(V_fun(a_prime, is_p, iom), 1e-10)^(egam)/egam*P[is, is_p]

    end

    util = (c^egam/egam + expect)

    return util

end

#@code_warntype(bellman(5.0, 3, 2, 5, V_fun, V_un_fun, 0.02, para))

#@btime (bellman(5.0, 3, 2, 5, $V_fun, $V_un_fun, 0.02, $para))

function bellman_un(a_prime, ia, iom, V_fun, V_un_fun, jf_val, r, para)

    # Bellman equation of unemployed HH with state (a) and guess of

    # savings a_prime

    @unpack Ξ΄, Ξ², A_p, Ξ±, Tr, a, omega_grid, Pi, h_un, a_bor, NS, egam =para

    jf = jf_val[iom]

    a_i = a[ia]

    # consumption

    c = max(((a_i+a_bor)*(1+r)-(a_prime+a_bor)+h_un), 1e-10)

    expect = Ξ²*(1.0-jf)*max(V_un_fun(a_prime), 1e-10)^(egam)/egam

    @inbounds for is_p in 1:NS

        expect += Ξ²*jf*max(V_fun(a_prime, is_p, iom), 1e-10)^(egam)/egam*Pi[is_p]

    end

    util = c^egam/egam + expect

    return util

end

function update_value_un(V_fun, V_un_fun, jf_vals, r, para)

    # Unemployed

    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para

    a_pol_un = zeros(NA)

    c_pol_un = zeros(NA)

    V_un_new = zeros(NA)

    omega_policy = zeros(Int, NA)

    temp_fun = zeros(NA, Nomega)

    temp_maximizer = zeros(NA, Nomega)

    @inbounds Threads.@threads for ia in 1:NA

        a_i = a[ia]

        for iom = 1:Nomega

            # get a_prime

            #a_u = max((a[ia] + a_bor)*(1+r) - c_pol_un[ia] + h_un, 1e-10)

            a_u = max((a_i + a_bor)*(1+r) + h_un, 1e-10)

            objective = x -> bellman_un(x, ia, iom, V_fun, V_un_fun, jf_vals, r, para)

            sol = maximize(objective, a_l, a_u)

            temp_fun[ia, iom] = Optim.maximum(sol)

            temp_maximizer[ia, iom] = Optim.maximizer(sol)

        end

        # Ο‰ which maximizes value

        iom_max = argmax(@view(temp_fun[ia, :]))

        # policies and value function

        a_pol_un[ia] = temp_maximizer[ia, iom_max]

        c_pol_un[ia]     =max((a_i+a_bor)*(1+r) +h_un - (a_pol_un[ia] +a_bor)  , (1e-10))

        omega_policy[ia] =iom_max

        V_un_new[ia] = temp_fun[ia, iom_max]

    end

    return V_un_new, a_pol_un, c_pol_un, omega_policy

end

function update_value(V_fun, V_un_fun, r, para)

    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para

    a_pol = zeros(NA, NS, Nomega)

    c_pol = zeros(NA, NS, Nomega)

    V_new = zeros(NA, NS, Nomega)

    k_opt = @. ((r+Ξ΄)/(z*A_p*Ξ±))^(Ξ±-1)

    # Interpolate employed value function

    @inbounds Threads.@threads for ia in 1:NA

        a_i = a[ia]

        for iom in 1:Nomega

            for is in 1:NS

                z_i = z[is]

                wage = omega_grid[iom]*(z_i*A_p*k_opt[is]^Ξ±-(r+Ξ΄)*k_opt[is])

                #x_in = max((a[ia] + a_bor)*(1+r) + wage - c_pol[ia, is, iom] - Tr, 1e-10)

                a_u = max((a_i + a_bor)*(1+r) + wage - Tr, 1e-10)

                objective = x -> bellman(x, ia, is, iom, V_fun, V_un_fun, r, para)

                #sol = maximize(objective, a_l, a_u)

                sol = maximize(objective, a_l, a_u)

                x_in = Optim.maximizer(sol)

                a_pol[ia, is, iom] = max(x_in, 0.0)

                c_pol[ia, is, iom] = max((a_i+a_bor)*(1+r)+wage-(

                    a_pol[ia, is, iom] + a_bor) - Tr, 1e-10)

                V_new[ia, is, iom] = Optim.maximum(sol)

            end

        end

    end

    return  V_new, a_pol, c_pol

end

function solve_household(V, V_un, jf_vals, r, para; itermax=5000, error_tol=1e-9, N=10)

    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_u, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para

    #initialize policies

    a_pol_un = zeros(NA)

    c_pol_un = zeros(NA)

    V_un_new = similar(V_un)

    omega_policy = zeros(Int, NA)

    a_pol = zeros(NA, NS, Nomega)

    c_pol = zeros(NA, NS, Nomega)

    V_new = similar(V)

    # Outer loop for value function iteration

    err = 1.0

    iter_num = 1

    #V_fun(x, is, iom) = LinearInterpolation(a, egam.*V[:, is, iom].^(1/egam), extrapolation_bc=Line())(x)

    #V_un_fun(x) = LinearInterpolation(a, egam.*V_un.^(1/egam), extrapolation_bc=Line())(x)

    while err > error_tol && iter_num < itermax

        V_un_fun(x) = LinearInterpolation(a, egam.*V_un.^(1/egam), extrapolation_bc=Interpolations.Flat())(x)

        V_fun(x, is, iom) = LinearInterpolation(a, egam.*V[:, is, iom].^(1/egam), extrapolation_bc=Interpolations.Flat())(x)

        # Optimal decision for every gridpoint

        # given the interest rate

        V_un_new, a_pol_un, c_pol_un, omega_policy = update_value_un(V_fun, V_un_fun, jf_vals, r, para)  

        V_new, a_pol, c_pol = update_value(V_fun, V_un_fun, r, para)

        # update value function for fixed policies

        for n in 1:N

            V_un_fun_temp(x) = LinearInterpolation(a, egam.*V_un_new.^(1/egam), extrapolation_bc=Interpolations.Flat())(x)

            #V_fun(x, is, iom) = LinearInterpolation(a, egam.*V_new[:, is, iom].^(1/egam), extrapolation_bc=Line())(x)

            for ia in 1:NA

                x_u = a_pol_un[ia]

                iom_max = omega_policy[ia]

                V_un_new[ia] = bellman_un(x_u, ia, iom_max, V_fun, V_un_fun, jf_vals, r, para)

                 for is in 1:NS

                     for iom in Nomega

                         x_e = a_pol[ia, is, iom]

                         V_new[ia, is, iom] = bellman(x_e, ia, is, iom, V_fun, V_un_fun, r, para)

                     end

                end

            end

        end

        err_un = maximum(abs.(V_un_new - V_un)./maximum(abs.(V_un)))

        @show err_un

        err_e = maximum(abs.(V_new - V)./maximum(abs.(V)))

        @show err_e

        err = err_un + err_e

        # update unemployed value function

        V_un .= V_un_new

        V .= V_new

        # Interpolate value functions (transformed)

        @show iter_num +=1

    end

    return V, V_un, c_pol_un, c_pol, a_pol_un, a_pol, omega_policy

end

# initialize

const para = Para(NA=50)

@unpack a, z, omega_grid, egam = para

r,  k_opt, c_pol, c_pol_un, V, V_un = initialize(para)

# Firm value by value function iteration: (NS, Nomega)

J, profits = bellman_firm(r, para)

# Tightness function: length Nomega

ΞΈ_vals = ΞΈ_fun(J, r, para)

# Job finding rate

jf = matching_functions(para)[1]

@show jf_vals = jf.(ΞΈ_vals)

# Solve HH problem: value and policy functions

# @code_warntype update_value(V_fun, V_un_fun, r, para)

# @btime update_value($V_fun, $V_un_fun, $r, $para)

# @code_warntype solve_household(V, V_un, jf_vals, r, para)

# @report_call solve_household(V, V_un, jf_vals, r, para)

# @report_call update_value(V_fun, V_un_fun, r, para)

# @report_call update_value_un(V_fun, V_un_fun, jf_vals, r, para)

V, V_un, c_pol_un, c_pol, a_pol_un, a_pol, omega_policy = solve_household(V, V_un, jf_vals, r, para)

# Assign bins to savings policies

# Run the profile using new setting

@profile solve_household(V, V_un, jf_vals, r, para)

ProfileView.view()

#ProfileSVG.save("solve_householdprofile_results.svg")

#Profile.clear()

function plot_policies()

    # J value function

    fig, ax = plt.subplots()

    ax.plot(omega_grid, J[1, :], label="Productivity level=1")

    ax.plot(omega_grid, J[5, :], label="Productivity level=5")

    ax.legend()

    tight_layout()

    display(fig)

    # Tightness function

    fig, ax = plt.subplots()

    ax.plot(ΞΈ_vals, label="Tightness function ΞΈ(Ο‰)")

    ax.legend()

    tight_layout()

    display(fig)

    # Submarket policy

    fig, ax = plt.subplots()

    ax.plot(a, omega_grid[omega_policy], label="Wage submarket choice")

    ax.legend()

    tight_layout()

    display(fig)

    # Value functions

    fig, ax = plt.subplots()

    ax.plot(a, V_un, label="Value function of unemployed")

    #ax.plot(a, V[:, 1, 7] )

    ax.legend()

    tight_layout()

    display(fig)

    indices = [a.>2.0]

    fig, ax = plt.subplots()

    for Ο‰_i in eachindex(omega_grid)

        ax.plot(a[10:end], V[10:end, 5, Ο‰_i], label="Value function of employed:Ο‰=$Ο‰_i")

        #ax.plot(a, V[:, 1, 7] )

        ax.legend()

    end

    tight_layout()

    display(fig)

    fig, ax = plt.subplots()

    for z_i in eachindex(z)

        ax.plot(a[10:end], V[10:end, z_i, 7], label="Value function of employed:z=$z_i")

        #ax.plot(a, V[:, 1, 7] )

        ax.legend()

    end

    tight_layout()

    display(fig)

    # Consumption policies

    fig, ax = plt.subplots()

    ax.plot(a, c_pol_un, label="Consumption of unemployed")

    ax.legend()

    tight_layout()

    display(fig)

    fig, ax = plt.subplots()

    for z_i in [1, 9]

        for Ο‰_i in [1, 14]

            ax.plot(a[10:end], c_pol[10:end, z_i, Ο‰_i], label="Consumption of employed: z=$z_i, Ο‰=$Ο‰_i")

            ax.legend()

        end

    end

    tight_layout()

    display(fig)

    # Savings policies

    fig, ax = plt.subplots()

    ax.plot(a, a_pol_un, label="Assets of unemployed")

    ax.plot(a, a, label="45-degree line")

    ax.legend()

    tight_layout()

    display(fig)

    fig, ax = plt.subplots()

    ax.plot(a, a, label="45-degree line")

    for z_i in [1, 9]

        for Ο‰_i in [1, 14]

            ax.plot(a, a_pol[:, z_i, Ο‰_i], label="Assets of employed: z=$z_i, Ο‰=$Ο‰_i")

            ax.legend()

        end

    end

    tight_layout()

    display(fig)

end

plot_policies()

It looks like allocations are happening explicitly and implicitly. Explicitly there are some zeros which can be quite costly in a hot loop. It looks like the arrays all stay the same size, so there’s no sense allocating new ones with each update_value. The solution is to allocate these arrays once, and then have your hot functions act on them in place. This might make the code a bit uglier, since you’ll have to pass the in-place arrays into the functions.

I also think there are some implicit allocations, e.g. egam.*V[:, is, iom] first allocates memory for the slice of V. This could potentially be reduced by using @view V[:, is, iom], or a for loop. I didn’t look too closely, but usually these kinds of allocations can be eliminated entirely, you just have to watch out for them.

Also implicit to the user is the allocation within LinearInterpolation, pointed out by @goerch. My impression is that Interpolations is designed for high performance in production, but probably that comes at a cost during set-up. So setting up a new interpolation within a hot loop costs unnecessary allocations. Since the arrays stay the same size, It seems like you only need a very lightweight non-allocating interpolator, which might be as simple as a one-liner.

Also, stuff like maximize also allocates a little, because it returns a sol struct. Again, there might be a lightweight optimizer that doesn’t allocate. I don’t know anything about your problem, but is the algorithm linearly interpolating a discrete grid, and then doing continuous optimization on that? Shouldn’t the optimum always be on a discrete grid point (not an interpolant), and are there no better ways to do this, say in discrete optimization?

If you are just iterating corrections and re-optimizing, then it’s probably a good idea to supply a starting guess to maximize based on the previous solution.

I suspect that you could get rid of allocations everywhere except set-up, and do a lightweight interpolator (and maybe optimizer), and end up saving significant time. In the original Fortran code, do you know how these operations were done, by what packages, and whether the set-up time was outside the loop?

Greetings. I was working on a few other things and thus could not reply right away.

Thanks a lot @artkuo for these points. I reformulated update_value and update_value_un to mutate arguments in place and preallocated the necessary arrays once. I also rearranged it so that LinearInterpolation functions are defined only once by letting the array V or V_un enter as an argument.

Even so, there are may allocations within update_value in particular. I strongly suspect the sol structure for the optimization in each point of the state space. I just don’t how to reformulate the problem in a way that does not allocate as much.

The optimization that enters maximize represents a savings decision, which is typically modeled as continuous, as we generally think of the underlying asset as varying continuously. Thus, the optimum is not necessarily on a discrete point.

In the Fortran code, essentially all arrays were preallocated in a file Globals.f90. There was heavy use of subroutines, which modify arguments in place and thus avoid many allocations. Additionally, the optimizer is fminsearch from toolbox.f90, which works as a subroutine and seems to avoid the overhead associated with the sol struct.

If it is of any use, I copy below the modified program.

Thanks so much for all the helpful input!

using PyPlot, BenchmarkTools

using PyFormattedStrings

using LaTeXStrings

using Parameters, CSV, Random, QuantEcon

using Dierckx, Distributions, ArgParse

using Optim, Interpolations

using JET, ProfileView, ProfileSVG

function grid_Cons_Grow(n, left, right, g)

    """

    Creates n+1 gridpoints with growing distance on interval [a, b]

    according to the formula

    x_i = a + (b-a)/((1+g)^n-1)*((1+g)^i-1) for i=0,1,...n

    """

    x = zeros(n)

    for i in 0:(n-1)

        x[i+1] = @. left + (right-left)/((1+g)^n-1)*((1+g)^i-1)

    end

    return x

end

Para = @with_kw ( Ξ³=0.5,

    egam= 1.0-1.0/Ξ³,

    Ξ² = 0.96,

    Ξ± = 0.2,

    A_p = 1.0,

    Ξ΄= 0.0,

    ρ = 0.9,

    Οƒ_eps = 0.04*(1.0-ρ^2),

    NS= 9,

    a_l= 0.0,

    a_u= 50.0,

    NA=50,

    a = grid_Cons_Grow(NA, a_l, a_u, 0.01),

    mc= rouwenhorst(NS, ρ, Οƒ_eps^(1/2), 0),

    P = mc.p,

    z = exp.(mc.state_values),

    Nomega = 14,

    omega_grid = range(0.1, stop=0.9, length=Nomega),

    a_bor =-0.0,

    Tr = 0.02,

    h_un = 0.02,

    Pi = stationary_distributions(mc)[1],

    sep = 0.05,

    ΞΊ = 2.0,

    A_m = 0.54,

    Ξ·_L = 0.72

    )

   

    # Create asset grid

   

    #u::Function = (c, Ξ³=Ξ³) -> c^(1-Ξ³)/(1-Ξ³)

    #u_prime::Function = c -> c^(-Ξ³)

    #u_prime_inv::Function = c -> c^(-1/Ξ³)

   

function matching_functions(para)

@unpack A_m, Ξ·_L = para

  # Matching functions

  jf(ΞΈ) = min(A_m*ΞΈ^(1-Ξ·_L), 1.0)

  q(ΞΈ) = min(A_m*ΞΈ^Ξ·_L, 1.0)

  return jf, q

end

function initialize(para)

    @unpack γ, egam, α, β, A_p, δ, ρ, NS, NA, Nomega, z, a, a_bor, Tr, h_un = para

    minrate = -Ξ΄

    maxrate = (1-Ξ²)/Ξ²

    r = 0.8*(minrate+maxrate)

    k_opt = @. ((r+Ξ΄)/(z*A_p*Ξ±))^(Ξ±-1)

    c_pol = zeros(NA, NS, Nomega)

    for is in 1:NS

        for iom in 1:Nomega

            c_pol[:, is, iom] = @. max(r*(a + a_bor),1e-10)

        end

    end

 

    V = c_pol.^(egam)/egam

    c_pol_un = @. max(r*(a+a_bor), 1e-10)

    V_un = c_pol_un.^(egam)/egam

    # Initial guess of distribution function

    phi = 1.0/(NA*NS*Nomega*2)

    return r, k_opt, c_pol, c_pol_un, V, V_un

end

function bellman_firm(r, para;tol=1e-7)

    #=

    Value of firm at productivity grid (is, iom)

    =#

    @unpack omega_grid, z, NS, Nomega, Ξ΄, Ξ±, A_p, sep, P = para

    J_new = zeros(NS, Nomega)

    diff = 1.0

    profit = zero(J_new)

    while diff > tol

        J = zero(J_new)

        for (iom, Ο‰) in enumerate(omega_grid)

            for (is, z_i) in enumerate(z)

                k_opt = ((r+Ξ΄)/(z_i*A_p*Ξ±))^(Ξ±-1)

                profit[is, iom] = (1-Ο‰)* (z_i*A_p*k_opt^Ξ±-(r+Ξ΄)*k_opt)

                for is_p in 1:NS

                    J[is, iom] += (profit[is, iom] + (1.0-sep)/(1+r)*J_new[is_p, iom])*P[is, is_p]

                end

            end

        end

        diff = maximum(abs.(J-J_new))

        #print(diff)

        J_new .= J

    end

    return J_new, profit

end

function ΞΈ_fun(J, r, para)

    # Returns mapping between Ο‰ and ΞΈ

    @unpack NS, Nomega, A_p, ΞΊ, Ξ·_L, Pi = para

    # Expected firm value from stationary distribution

    EJ = zeros(Nomega)

    for iom in 1:Nomega

        EJ[iom] = sum(J[:, iom].*Pi)

    end

    EJ .= @. max(EJ, 1e-10)

    ΞΈ = @. (A_p*EJ/(ΞΊ*(1+r)))^(1/Ξ·_L)

    return ΞΈ

end

#@code_warntype(ΞΈ_fun(J, 0.02, para))

#@btime ΞΈ_fun(J, 0.02, para)

function bellman(a_prime, ia, is, iom, V_fun, V_un_fun, V, V_un, r, para)

   

    # Bellman equation of employed HH with state (a,z,Ο‰) and guess of

    # savings a_prime

    @unpack Ξ΄, Ξ², A_p, Ξ±, Tr, a, omega_grid, P, sep, egam, z, a_bor, NS =para

    a_i, z_i, Ο‰ = a[ia], z[is], omega_grid[iom]

    k_opt = ((r+Ξ΄)/(z_i*A_p*Ξ±))^(Ξ±-1)

    revenue = z_i*A_p*k_opt^Ξ±-(r+Ξ΄)*k_opt

    wage = Ο‰*revenue

    c = max((a_i + a_bor)*(1+r)+wage-(a_prime+a_bor)-Tr, 1e-10)

    expect = Ξ²*sep*max(V_un_fun(a_prime, V_un)^(egam)/egam, 1e-10)

    for is_p in 1:NS

        expect += Ξ²*(1-sep)*max(V_fun(a_prime, is_p, iom, V), 1e-10)^(egam)/egam*P[is, is_p]

    end

    util = (c^egam/egam + expect)

    return util

end

#@code_warntype(bellman(5.0, 3, 2, 5, V_fun, V_un_fun, 0.02, para))

#@btime (bellman(5.0, 3, 2, 5, $V_fun, $V_un_fun, 0.02, $para))

function bellman_un(a_prime, ia, iom, V_fun, V_un_fun, V, V_un, jf_val, r, para)

    # Bellman equation of unemployed HH with state (a) and guess of

    # savings a_prime

    @unpack Ξ΄, Ξ², A_p, Ξ±, Tr, a, omega_grid, Pi, h_un, a_bor, NS, egam =para

    jf = jf_val[iom]

    a_i = a[ia]

    # consumption

    c = max(((a_i+a_bor)*(1+r)-(a_prime+a_bor)+h_un), 1e-10)

    expect = Ξ²*(1.0-jf)*max(V_un_fun(a_prime, V_un), 1e-10)^(egam)/egam

    @inbounds for is_p in 1:NS

        expect += Ξ²*jf*max(V_fun(a_prime, is_p, iom, V), 1e-10)^(egam)/egam*Pi[is_p]

    end

    util = c^egam/egam + expect

    return util

end

function update_value_un!(a_pol_un, c_pol_un, omega_policy, V_un_new, V_fun, V_un_fun, jf_vals, r, para)

    # Unemployed

    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para

    temp_fun = zeros(NA, Nomega)

    temp_maximizer = zeros(NA, Nomega)

    @inbounds Threads.@threads for ia in 1:NA

        a_i = a[ia]

        for iom = 1:Nomega

            # get a_prime

            #a_u = max((a[ia] + a_bor)*(1+r) - c_pol_un[ia] + h_un, 1e-10)

            a_u = max((a_i + a_bor)*(1+r) + h_un, 1e-10)

            objective = x -> bellman_un(x, ia, iom, V_fun, V_un_fun, V, V_un, jf_vals, r, para)

            sol = maximize(objective, a_l, a_u)

            temp_fun[ia, iom] = Optim.maximum(sol)

            temp_maximizer[ia, iom] = Optim.maximizer(sol)

        end

        # Ο‰ which maximizes value

        iom_max = argmax(@view(temp_fun[ia, :]))

        # policies and value function

        a_pol_un[ia] = temp_maximizer[ia, iom_max]

        c_pol_un[ia]     =max((a_i+a_bor)*(1+r) +h_un - (a_pol_un[ia] +a_bor)  , (1e-10))

        omega_policy[ia] =iom_max

        V_un_new[ia] = temp_fun[ia, iom_max]

    end

end

function update_value!(a_pol, c_pol, V_new, V_fun, V_un_fun, r, para)

    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para

    k_opt = @. ((r+Ξ΄)/(z*A_p*Ξ±))^(Ξ±-1)

    # Interpolate employed value function

    @inbounds Threads.@threads for ia in 1:NA

        a_i = a[ia]

        for iom in 1:Nomega

            for is in 1:NS

                z_i = z[is]

                wage = omega_grid[iom]*(z_i*A_p*k_opt[is]^Ξ±-(r+Ξ΄)*k_opt[is])

                #x_in = max((a[ia] + a_bor)*(1+r) + wage - c_pol[ia, is, iom] - Tr, 1e-10)

                a_u = max((a_i + a_bor)*(1+r) + wage - Tr, 1e-10)

                objective = x -> bellman(x, ia, is, iom, V_fun, V_un_fun, V, V_un, r, para)

                sol = maximize(objective, a_l, a_u)

                x_in = Optim.maximizer(sol)

                a_pol[ia, is, iom] = max(x_in, 0.0)

                c_pol[ia, is, iom] = max((a_i+a_bor)*(1+r)+wage-(

                    a_pol[ia, is, iom] + a_bor) - Tr, 1e-10)

                V_new[ia, is, iom] = Optim.maximum(sol)

            end

        end

    end

end

function solve_household(V, V_un, jf_vals, r, para; itermax=5000, error_tol=1e-9, N=10)

    @unpack NA, NS, Nomega, a, omega_grid, a_l, a_u, a_bor, A_p, h_un, z, Ξ΄, Ξ±, Tr, egam = para

    #initialize policies

    a_pol_un = zeros(NA)

    c_pol_un = zeros(NA)

    V_un_new = similar(V_un)

    omega_policy = zeros(Int, NA)

    a_pol = zeros(NA, NS, Nomega)

    c_pol = zeros(NA, NS, Nomega)

    V_new = similar(V)

    # Outer loop for value function iteration

    err = 1.0

    iter_num = 1

    V_un_fun(x, V_un) = LinearInterpolation(a, egam.*V_un.^(1/egam), extrapolation_bc=Interpolations.Flat())(x)

    V_fun(x, is, iom, V) = LinearInterpolation(a, egam.*@view(V[:, is, iom]).^(1/egam), extrapolation_bc=Interpolations.Flat())(x)

    while err > error_tol && iter_num < itermax

        # Optimal decision for every gridpoint

        # given the interest rate

        update_value_un!(a_pol_un, c_pol_un, omega_policy, V_un_new, V_fun, V_un_fun, jf_vals, r, para)  

        update_value!(a_pol, c_pol, V_new, V_fun, V_un_fun, r, para)

        # update value function for fixed policies

        for n in 1:N

            #V_un_fun_temp(x) = LinearInterpolation(a, egam.*V_un_new.^(1/egam), extrapolation_bc=Interpolations.Flat())(x)

            #V_fun(x, is, iom) = LinearInterpolation(a, egam.*V_new[:, is, iom].^(1/egam), extrapolation_bc=Line())(x)

            for ia in 1:NA

                x_u = a_pol_un[ia]

                iom_max = omega_policy[ia]

                V_un_new[ia] = bellman_un(x_u, ia, iom_max, V_fun, V_un_fun, V, V_un, jf_vals, r, para)

                 for is in 1:NS

                     for iom in Nomega

                         x_e = a_pol[ia, is, iom]

                         V_new[ia, is, iom] = bellman(x_e, ia, is, iom, V_fun, V_un_fun, V, V_un, r, para)

                     end

                end

            end

        end

        err_un = maximum(abs.(V_un_new - V_un)./maximum(abs.(V_un)))

        @show err_un

        err_e = maximum(abs.(V_new - V)./maximum(abs.(V)))

        @show err_e

        err = err_un + err_e

        # update unemployed value function

        V_un .= V_un_new

        V .= V_new

        # Interpolate value functions (transformed)

        @show iter_num +=1

    end

    return V, V_un, c_pol_un, c_pol, a_pol_un, a_pol, omega_policy

end

# initialize

const para = Para(NA=50)

@unpack a, z, omega_grid, egam = para

r,  k_opt, c_pol, c_pol_un, V, V_un = initialize(para)

# Firm value by value function iteration: (NS, Nomega)

J, profits = bellman_firm(r, para)

# Tightness function: length Nomega

ΞΈ_vals = ΞΈ_fun(J, r, para)

# Job finding rate

jf = matching_functions(para)[1]

@show jf_vals = jf.(ΞΈ_vals)

# Solve HH problem: value and policy functions

#@code_warntype update_value!(a_pol, c_pol, V_new, V_fun, V_un_fun, r, para)

#@btime update_value!($a_pol, $c_pol, $V_new, $V_fun, $V_un_fun, $r, $para)

#@btime update_value_un!($a_pol_un, $c_pol_un, $omega_policy, $V_un_new, $V_fun, $V_un_fun, $jf_vals, $r, $para)

#@code_warntype solve_household(V, V_un, jf_vals, r, para)

#@report_call solve_household(V, V_un, jf_vals, r, para)

# @report_call update_value(V_fun, V_un_fun, r, para)

# @report_call update_value_un(V_fun, V_un_fun, jf_vals, r, para)

V, V_un, c_pol_un, c_pol, a_pol_un, a_pol, omega_policy = solve_household(V, V_un, jf_vals, r, para, N=25)

# Assign bins to savings policies

# Run the profile using new setting

using Profile

@profile solve_household(V, V_un, jf_vals, r, para)

ProfileView.view()

#ProfileSVG.save("solve_householdprofile_results.svg")

#Profile.clear()

Just looking through your code quickly without understanding it, here are a couple impressions. First, I think your new pre-allocation should help. Second, you’re right that maximize is allocating, but that should be pretty minimal. I just tried a maximize and it allocated 160 bytes, which isn’t necessarily that bad, although it could be zero.

Third, I believe your interpolation may still be costing. You moved V_un_fun(x, V_un) = LinearInterpolation(...) up, which means you saved redefining the function, but that doesn’t save on setting up interpolation, which still happens each time V_un_fun is called in the hot loop. So time and allocations may still be spent. I suspect that you need a lightweight linear interpolator, which could probably be pretty simple (is it 1D?). Interpolations is not very complicated, but I don’t think they optimized for doing the same interpolation over and over, just with a different set of data.

What does your profiler show? Can you tell whether maximize is costing you, or interpolations? If optimize, you could make a new non-allocating version pretty easily. Looking at the code, there are two small allocations for tr and return that could be removed.