Help speeding up finite horizon dynamic programming problem

I am struggling to speed up the computation of solutions for a finite horizon dynamic programming problem. I am using interpolation to avoid computing the value function outside of points in the grid. Right now it takes my computer around 7 minutes to solve this 5-period model.

Please have a look and let me know what I can improve. Any suggestions you may have will help me!

Thanks!

# VFI finite horizon
using Optim
using Distributions
using Random
using Interpolations

# transformation functions
# open interval (a, b)
function logit(x; a = 0, b = 1)
    (b - a) * (exp(x)/(1 + exp(x))) + a
end

function utility(c::R, L::R) where R <: Real
    if c <= 0 || L <= 0
        return -Inf
    else
        return log(c) + log(1 - L)
    end
end

const T = 5 # terminal period
const β = 0.95 # discount factor
const r = 0.05 # interest rate
const dist = LogNormal(0, 1) # distribution of Îľ
const n = 1 # number of points of support of Îľ

Random.seed!(1234)
const w = Vector{Float64}(undef, T) # exogenous wages

w .= (900 .+ 20.0 .* (1:T) .- 0.5 .* (1:T).^2)

const grid_A = -1000:10:8_000

const Îľ = quantile.(dist, 0:1/n:prevfloat(1.0))

const Vdict = Array{Float64}(undef, (n, length(grid_A), T))
const Cdict = Array{Float64}(undef, (n, length(grid_A), T))
const Ldict = Array{Float64}(undef, (n, length(grid_A), T))
const A1dict = Array{Float64}(undef, (n, length(grid_A), T))
const convdict = Array{Bool}(undef, (n, length(grid_A), T))

# solve last period
for s in 1:length(grid_A)
    for i in 1:n
        opt = optimize( x -> -utility(w[T]*x + grid_A[s]*(1+r), x), 0.0, 1.0 )
        Ldict[i, s, T] = Optim.minimizer(opt)
        Cdict[i, s, T] = (w[T] + Îľ[i])*Ldict[i, s, T] + grid_A[s]*(1+r)
        A1dict[i, s, T] = 0.0
        Vdict[i, s, T] = -Optim.minimum(opt)
    end
end

interp_func(t) = LinearInterpolation(grid_A, sum(Vdict[i, :, t] for i in 1:n) ./ n, extrapolation_bc = Line())

# solve rest of periods
@time for t in T-1:-1:1
    for s in 1:length(grid_A)
        for i in 1:n
            initial_x = [A1dict[i, s, t+1], 0.0]
            # x[1] is assets to carry forward, x[2] is labor supply
            opt = optimize( x -> -(utility(logit(x[2])*(w[t] + Îľ[i])+ grid_A[s]*(1+r) - x[1], logit(x[2])) +
                        β*interp_func(t+1)(x[1]) ),
                  initial_x, method = NewtonTrustRegion(), autodiff = :forward, iterations = 100_000)
            Vdict[i, s, t] = -Optim.minimum(opt)
            xstar = Optim.minimizer(opt)
            Ldict[i, s, t] = logit(xstar[2])
            A1dict[i, s, t] = xstar[1]
            Cdict[i, s, t] = Ldict[i, s, t] * (w[t] + Îľ[i]) + grid_A[s] * (1+r) - A1dict[i, s, t]
            convdict[i, s, t] = Optim.converged(opt)
        end
    end
    println("period ", t, " finished")
end

The first thing I would suggest is also the very first thing on the list of performance tips: put all of your code into functions rather than relying on global variables. Having your code in functions will also make it easier to read and optimize further.

1 Like

Does it really matter? All those variables are consts and the rest are local to for loops.

It probably won’t matter here since most of your time is likely inside some library function like optimize. But it could. If you don’t get rid of the most simple performance problems that people encounter, it will get pointed out to you here. So to get the most relevant responses you should try to follow Performance Tips · The Julia Language first and then, if possible, do some initial profiling to see if you can give some info about what is slow.

In addition to what @kristoffer.carlsson said , I’d be concerned about performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub and the closures created in the above code, and it will be much easier to check for that issue (and fix it) when dealing with functions rather than dealing with global state.

1 Like

Ok, here is my naive attempt at addressing your comments. Is this better?

using Optim
using Interpolations

struct DP{Ty <: Real}
    u::Function
    T::Int
    β::Ty
    r::Ty
    n::Int
    w::Vector{Ty}
    grid_A::Vector{Ty}
    Îľ::Vector{Ty}
end

function solvelast!(dp::DP{Ty}, Ldict::Array{Ty}, Cdict::Array{Ty}, A1dict::Array{Ty}, Vdict::Array{Ty}) where Ty <: Real
    utility = dp.u
    grid_A = dp.grid_A
    n = dp.n
    w = dp.w
    Îľ = dp.Îľ
    r = dp.r
    T = dp.T
    for s in 1:length(grid_A)
        for i in 1:n
            opt = optimize( x -> -utility(w[T]*x + grid_A[s]*(1+r), x), 0.0, 1.0 )
            Ldict[i, s, T] = Optim.minimizer(opt)
            Cdict[i, s, T] = (w[T] + Îľ[i])*Ldict[i, s, T] + grid_A[s]*(1+r)
            A1dict[i, s, T] = 0.0
            Vdict[i, s, T] = -Optim.minimum(opt)
        end
    end
    return Ldict, Cdict, A1dict, Vdict
end

# transformation functions
# open interval (a, b)
function logit(x; a = 0, b = 1)
    (b - a) * (exp(x)/(1 + exp(x))) + a
end

# interpolation
interp_func(t) = LinearInterpolation(grid_A, sum(Vdict[i, :, t] for i in 1:n) ./ n, extrapolation_bc = Line())

function solverest!(dp::DP{Ty}, Ldict::Array{Ty}, Cdict::Array{Ty}, A1dict::Array{Ty}, Vdict::Array{Ty}, convdict::Array{Bool}; t0::Int=1) where Ty <: Real
    utility = dp.u
    grid_A = dp.grid_A
    n = dp.n
    w = dp.w
    Îľ = dp.Îľ
    r = dp.r
    T = dp.T

    for t in T-1:-1:t0
        for s in 1:length(grid_A)
            for i in 1:n
                initial_x = [A1dict[i, s, t+1], 0.0]
                # x[1] is assets to carry forward, x[2] is labor supply
                opt = optimize( x -> -(utility(logit(x[2])*(w[t] + Îľ[i])+ grid_A[s]*(1+r) - x[1], logit(x[2])) +
                            β*interp_func(t+1)(x[1]) ),
                            initial_x, method = NewtonTrustRegion(), autodiff = :forward, iterations = 100_000)
                Vdict[i, s, t] = -Optim.minimum(opt)
                xstar = Optim.minimizer(opt)
                Ldict[i, s, t] = logit(xstar[2])
                A1dict[i, s, t] = xstar[1]
                Cdict[i, s, t] = Ldict[i, s, t] * (w[t] + Îľ[i]) + grid_A[s] * (1+r) - A1dict[i, s, t]
                convdict[i, s, t] = Optim.converged(opt)
            end
        end
        println("period ", t, " finished")
    end
    return Ldict, Cdict, A1dict, Vdict, convdict
end

I then instantiate the problem and solve it:

using Distributions

function u(c::R, L::R) where R <: Real
    if c <= 0 || L <= 0
        return -Inf
    else
        return log(c) + log(1 - L)
    end
end

T = 5 # terminal period
β = 0.95 # discount factor
r = 0.05 # interest rate
dist = LogNormal(0, 1) # distribution of Îľ
n = 1 # number of points of support of Îľ

w = (900 .+ 20.0 .* (1:T) .- 0.5 .* (1:T).^2)

grid_A = collect(-1000:10.0:8_000)

Îľ = quantile.(dist, 0:1/n:prevfloat(1.0))

Vdict = Array{Float64}(undef, (n, length(grid_A), T))
Cdict = Array{Float64}(undef, (n, length(grid_A), T))
Ldict = Array{Float64}(undef, (n, length(grid_A), T))
A1dict = Array{Float64}(undef, (n, length(grid_A), T))
convdict = Array{Bool}(undef, (n, length(grid_A), T))

dp = DP(u, T, β, r, n, w, grid_A, ξ)

solvelast!(dp, Ldict, Cdict, A1dict, Vdict)
solverest!(dp, Ldict, Cdict, A1dict, Vdict, convdict)

Definitely an improvement :slightly_smiling_face:

Have you tried benchmarking any of the parts of your process independently? Or have you tried profiling with Profile and GitHub - timholy/ProfileView.jl: Visualization of Julia profiling data ? And have you ensured that everything is type-stable by running @code_warntype ? For example, your DP struct has a field with type ::Function, but Function is an abstract type, so accessing that field inside the loop will likely perform suboptimally. You can instead make your type something like :

struct DP{Ty <: Real, F <: Function}
  u::F 
  ...

Running @code_warntype should show the instability caused by that abstract type annotation, and it should also confirm that the F <: Function change fixes it.

1 Like

I find that @code_warntype is not very informative here:

julia> @code_warntype solverest!(dp, Ldict, Cdict, A1dict, Vdict, convdict)
Body::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3},Array{Float64,3},Array{Bool,3}}
1 ─ %1 = invoke Main.:(#solverest!#14)(1::Int64, _1::Function, _2::DP{Float64,typeof(u)}, _3::Array{Float64,3}, _4::Array{Float64,3}, _5::Array{Float64,3}, _6::Array{Float64,3}, _7::Array{Bool,3})::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3},Array{Float64,3},Array{Bool,3}}
└──      return %1

I will try ProfileView, but I’ve had a hard time with that tool in the past.

There are several abstract types here that would kill performance. This is a first-order concern.

I find it much easier to work with named tuples as collecting sets of parameters It is less verbose and there is no chance of messing up abstract parameters. It is also easy create default values. In practice there is also no need annotate types for functions. For example:

function myalgorithm(params)
    # use params
end
params = (T = 5, dist = LogNormal(0, 1)) # etc.
myalgorithm(params)

That pattern will prevent using incorrect type annotations (because you never annotate).

It won’t specifically help performance, but you can also use the Parameters package:

using Parameters
function myalgorithm(params)
    @unpack T, dist = params # extracts values from named tuples to local binding
end

default_params = @with_wk (T = 5,  LogNormal(0, 1)) # generates with defaults
params = default_params() # all defaults
params_2 = default_params(T = 10) # changes one, etc
myalgorithm(params) # etc.
1 Like

Thanks for the suggestion @jlperla . So instead of defining a struct I should simply use NamedTuples.

I have a question, when using plain NamedTuples how do you extract the values from params, like below?

function f(params)
    fun = params.fun
    x = params.x
    return fun(x)
end

params = (fun = sin, x = pi)
f(params)

Thinking ahead, would a framework using NamedTuples be harder to debug? I think if an error arises, it would be hard to track, no?

I am not sure I understand — the only abstract type I see is Function. Fix that, or introduce a function barrier, and it should be fine.

NamedTuples are fine for prototyping, but starting with struct is OK to. Introducing a type parameter for the utility function u should fix everything above very easily.

1 Like

Constructing Linear interpolation objects is costly, and you’re doing it in every objective function call. Try constructing interp_func(t+1) once outside the objective and then call it within.

2 Likes

Also (should have caught this first): interp_func is referring to your value function in global scope. You need to change it so that you pass it various matrices it needs as arguments, otherwise everything will be slow.

I’m not sure I follow. I construct the interpolation outside the loops over i and s. I need to construct a new interpolation for each t, I thought the function interp_func(t) did that.

Ok, that is easily fixable.

Here, you’re calling interp_func, which does a lot of work to precalculate the slopes (I think) inside the hot loop of your optimization problem.

2 Likes

I think I understand what you mean. So, instead of constructing the function outside the main loop, I can construct it inside the loop for t, but outside the other 2 loops. So I would end up with something like this:

for t in T-1:-1:1
    interp_func = ... # create interpolation
    for s in 1:length(grid_A)
        for i in 1:n
            opt = ... #<optimization calling interp_func(x[1])>
           .....

That way the interp_func is constructed once for each t. Is that what you mean?

Yes exactly. But this isn’t the big problem here.

After looking at this for a bit, I think the bigger problem (and what it looks like is taking all of the time in “solving” this) is that you’re not solving it. The problem is not defined at the low end of your asset grid, since there are grid points for which there is no labor supply that ensures positive consumption. It looks like it’s taking forever because Optim is trying to be really sure that it has searched before it returns “I guess you get -Inf utils.” But the problem is just undefined at those points…

I bet the problem will get solved very quickly if you restrict the asset grid to just positive values. The real trick will be allowing for negative assets up to the worker’s natural borrowing limit.

I think I let the full model (with T = 65) run for a couple of hours and the solutions involved some borrowing in early life. I will have to check that more carefully.

The good news is that with constructing the interpolating function inside the loop for t, I can now solve it within 50 seconds, which is a huge improvement! Thanks!

I was starting to think of constructing my own optimizer, since I read elsewhere that Optim is not completely type-stable, or using another parameterization to ensure the labor supply remains between 0 and 1. I thought that the logit function had tails that were too flat.

What are your grid sizes? This really shouldn’t take hours? Also this is really parallelisable, if you check my github there’s an implementation of a parallel finite horizon vfi thing from my thesis (Learning Models)

2 Likes