# Help me optimize dynamic programming code

I am working on a dynamic programming assignment for macro-economics that has code in Matlab which I tried to translate to Julia.

However I cannot get the performance to be what it should be.

Here is the loop solving the Bellman equation in each iteration

function perform_loop(
size_state_space::Int64,
utilities::Matrix{Float64},
optimal_choices::Vector{Int64},
current_utilities::Vector{Float64},
max_utilities::Vector{Float64},
absdiff::Vector{Float64})

for vfit = 1:VFMAXIT
#form Bellman equation
@inbounds for i in 1:size_state_space
u, ind = findmax(view(utilities, :, i) .+ β .* current_utilities)
optimal_choices[i] = ind
max_utilities[i] = u
end

# Vector of errors
absdiff .= abs.(max_utilities .- current_utilities)

# Maximum absolute error
vferr = maximum(absdiff)

#if (mod(vfit, 50) == 1)
#	println("VF error = $vferr on VF iteration$vfit.")
#end

# exit if converged
if (vferr < VFTOL)
println("Number of iterations: $vfit") println("Error achieved:$vferr")
break
end

# if not converged, then redo
current_utilities .= max_utilities
end

return optimal_choices
end


Here size_state_space is 1000, which means that the length for everything is 1000. Utilities is a 1000 by 1000 matrix indicating the utility for every combination of elements in the state space, which is equal to range(.001, stop = 1, length = 1000). It has many values equal to -Inf because some combinations are not feasible.

I feel like I have done a good job pre-allocating arrays and only using broadcasted assignment. I have also made sure to iterate column-wise over utilities for fastest iteration. As you can see, the bulk of my computation, done in the loop, has been placed in it’s own function. Nonetheless my code still takes twice as long as similar code in matlab.

I have made use of Profile and ProfileViews to better understand what is taking up so much time, but I have yet to have a breakthrough.

I am sure the problem will be obvious as I have not had much experience writing code that needs to be fast. Any help is appreciated.

3 Likes

here is a more complete MWE that can be copied and pasted

module SimpleBellman
using Statistics, LinearAlgebra, BenchmarkTools

# Definition of constants

# Max number of iterations
global const VFMAXIT = 1000

# Tolerance on Bellman Equation, in percentages
global const VFTOL = 1e-7

# Define the utility matrix for each pair
function _consumption(z_t, z_t1, r)
(1+r) * z_t - z_t1
end

function _utility(z_t, z_t1, γ, r)
((_consumption(z_t, z_t1, r) ^ (1 - γ))  - 1 ) / (1 - γ)
end

function _satisfies_budget_constraint(z_t, z_t1, r)
if z_t <= 0
return false
elseif z_t1 > (1 + r) * z_t
return false
else
return true
end
end

function perform_loop(
size_state_space::Int64,
utilities::Vector{Vector{Float64}},
optimal_choices::Vector{Int64},
current_utilities::Vector{Float64},
max_utilities::Vector{Float64},
absdiff::Vector{Float64},
β::Float64)

temp_bellman = similar(current_utilities)

for vfit = 1:VFMAXIT
#form Bellman equation
@inbounds for i in 1:size_state_space
u, ind = findmax(utilities[i] .+ β .* current_utilities)
optimal_choices[i] = ind
max_utilities[i] = u
end

# Vector of errors
absdiff .= abs.(max_utilities .- current_utilities)

# Maximum absolute error
vferr = maximum(absdiff)

#if (mod(vfit, 50) == 1)
#	println("VF error = $vferr on VF iteration$vfit.")
#end

# exit if converged
if (vferr < VFTOL)
println("Number of iterations: $vfit") println("Error achieved:$vferr")
break
end

# if not converged, then Vold <- V, and redo
current_utilities .= max_utilities
end

return optimal_choices
end

function get_policy_function(
γ,
β,
r,
size_state_space,
z_0,
z_min,
z_max,
)

λ = (β * (1+r)) ^ (1 / γ)

# Determining state space
if λ > 1
z_state_space = range(z_0, z_max, length = size_state_space)
else
z_state_space = range(z_min, z_0, length = size_state_space)
end

println("Beginning optimization")

absdiff = Vector{Float64}(undef, size_state_space)
optimal_choices_init = Vector{Int64}(undef, size_state_space)
max_utilities = Vector{Float64}(undef, size_state_space)

current_utilities = zeros(size_state_space)

utilities = [fill(-Inf, size_state_space) for i in 1:size_state_space]

for (i, z_t) in enumerate(z_state_space)
for (j, z_t1) in enumerate(z_state_space)
if _satisfies_budget_constraint(z_t, z_t1, r)
utilities[i][j] = _utility(z_t, z_t1, γ, r)
end
end
end

optimal_choices = perform_loop(
size_state_space,
utilities,
optimal_choices_init,
current_utilities,
max_utilities,
absdiff,
β)

println("Completed optimization")

function findnearest(A,t)
return findmin( abs.(A .- t))
end

function policy_function(z)
ind = findnearest(z_state_space, z)
return z_state_space[optimal_choices[ind]]
end

function utility_function(z)
return _utility(z, policy_function(z), γ, r)
end

function analytic_policy_function(z)
λ * z
end

function analytic_utility_function(z)
return _utility(z, analytic_policy_function(z), γ, r)
end

return z_state_space, policy_function, utility_function, analytic_policy_function, analytic_utility_function
end

function wrapper(
γ,
β,
r,
size_state_space,
z_0,
z_min,
z_max)

@time state_space, π, V, π_analytic, V_analytic = get_policy_function(
γ,
β,
r,
size_state_space,
z_0,
z_min,
z_max)
end

function main()
# Definition of parameter values
γ = 1.4
β = 0.9
r_low = .02
r_high = .2
λ = (β * (1 + r_low)) ^ (1 / γ)
# Time span
T = 100

# Determining state space
size_state_space = 1000
z_0 = 1
z_min = .001
z_max = 100

wrapper(
γ,
β,
r_low,
size_state_space,
z_0,
z_min,
z_max)
end

end #module


One important question in cases like this is whether you are actually performing the exact same calculation in both cases. Subtle errors or differences in the code might be leading to different number of iterations before convergence, in which case the performance of the language might not be as important as the number of iters. Have you verified that your Julia code produces the same answer as your old code in the same number of iterations?

I can confirm that I have the same results, same parameter values, tolerance and max iterations.

The matlab code is vectorized, however, in a way that was unintuitive to write in Julia. Rather than iterate through the size of the state space, it performs vectorized operation as follows:

    % Makes a big matrix, adding current utilities to each row
mat_to_find_max = utility + beta*repmat(current_utilities', size_state_space, 1);
% Finds the minimum along the second dimension
[max_utilities, optimal_choices] = max(mat_to_find_max, [], 2);


I assumed that because julia has fast for loops, this method would be just as fast as using vectorized operation.

Edit: and as far as I can tell, I’m right. the vectorized operation in julia makes things quite slow

I wonder if this part of the issue. You’re constructing a new vector to hold the result of utilities[i] .+ beta .* current_utilities for every iteration i, but that vector is just thrown away once you find its max element. That’s rather wasteful, since you’re repeatedly allocating and then destroying those vectors. Instead, you could write out a loop to find the maximum element by indexing directly into utilities[i] and current_utilities without ever computing the vector utilities[i] .+ β .* current_utilities.

For example, given vectors x and y, we can find the maximum of their sum without actually computing x .+ y. This is easier if we define a helper function:

julia> """
Find the maximum index and maximum value of f.(x)
"""
function mapped_max(f, x)
best_index = firstindex(x)
best_value = f(x[best_index])
@inbounds for i in (firstindex(x) + 1):lastindex(x)
y = f(x[i])
if y > best_value
best_value = y
best_index = i
end
end
(best_index, best_value)
end
mapped_max

julia> x = [.1, .2, .3]; y = [.1, .2, -.1]
3-element Array{Float64,1}:
0.1
0.2
-0.1

julia> mapped_max(i -> (x[i] + y[i]), 1:3)
(2, 0.4)


Or use the allready allocated absdiff to store the vector

absdiff .= utilities[i] .+ β .* current_utilities
findmax(absdiff)

1 Like

Also, the layout of a Vector{Vector{Float64}} in memory necessarily is not the same as a Matrix{Float64}, which is one reason the matrix approach might have some potential benefits w.r.t. cache access.

This does it. Performance goes from .9 seconds to .3 seconds with this change. Thanks for the help!

I also took the beta .* current_utilities out of the inner loop and allocated it so i wasn’t re-calculating every inner loop.

It would be nice if findmax had a built in lazy method. A findmax that took in a zipped iterator and had a do method would also have done the trick I think.

Do you