Help me optimize dynamic programming code

I am working on a dynamic programming assignment for macro-economics that has code in Matlab which I tried to translate to Julia.

However I cannot get the performance to be what it should be.

Here is the loop solving the Bellman equation in each iteration

function perform_loop(
	size_state_space::Int64, 
	utilities::Matrix{Float64}, 
	optimal_choices::Vector{Int64}, 
	current_utilities::Vector{Float64}, 
	max_utilities::Vector{Float64}, 
	absdiff::Vector{Float64})

	for vfit = 1:VFMAXIT
		#form Bellman equation
		@inbounds for i in 1:size_state_space
			u, ind = findmax(view(utilities, :, i) .+ β .* current_utilities)
			optimal_choices[i] = ind
			max_utilities[i] = u
		end

		# Vector of errors
		absdiff .= abs.(max_utilities .- current_utilities)

		 # Maximum absolute error
		vferr = maximum(absdiff)

		#if (mod(vfit, 50) == 1)
		#	println("VF error = $vferr on VF iteration $vfit.")
		#end

		# exit if converged
		if (vferr < VFTOL)
			println("Number of iterations: $vfit")
			println("Error achieved: $vferr")
		    break 
		end

		# if not converged, then redo
		current_utilities .= max_utilities
	end

	return optimal_choices
end

Here size_state_space is 1000, which means that the length for everything is 1000. Utilities is a 1000 by 1000 matrix indicating the utility for every combination of elements in the state space, which is equal to range(.001, stop = 1, length = 1000). It has many values equal to -Inf because some combinations are not feasible.

I feel like I have done a good job pre-allocating arrays and only using broadcasted assignment. I have also made sure to iterate column-wise over utilities for fastest iteration. As you can see, the bulk of my computation, done in the loop, has been placed in it’s own function. Nonetheless my code still takes twice as long as similar code in matlab.

I have made use of Profile and ProfileViews to better understand what is taking up so much time, but I have yet to have a breakthrough.

I am sure the problem will be obvious as I have not had much experience writing code that needs to be fast. Any help is appreciated.

3 Likes

here is a more complete MWE that can be copied and pasted

module SimpleBellman 
using Statistics, LinearAlgebra, BenchmarkTools

# Definition of constants

# Max number of iterations
global const VFMAXIT = 1000

# Tolerance on Bellman Equation, in percentages
global const VFTOL = 1e-7

# Define the utility matrix for each pair
function _consumption(z_t, z_t1, r)
	(1+r) * z_t - z_t1
end

function _utility(z_t, z_t1, γ, r)
	((_consumption(z_t, z_t1, r) ^ (1 - γ))  - 1 ) / (1 - γ)
end

function _satisfies_budget_constraint(z_t, z_t1, r)
	if z_t <= 0
		return false 
	elseif z_t1 > (1 + r) * z_t
		return false 
	else 
		return true
	end
end


function perform_loop(
	size_state_space::Int64, 
	utilities::Vector{Vector{Float64}}, 
	optimal_choices::Vector{Int64}, 
	current_utilities::Vector{Float64}, 
	max_utilities::Vector{Float64}, 
	absdiff::Vector{Float64}, 
	β::Float64)

	temp_bellman = similar(current_utilities)

	for vfit = 1:VFMAXIT
		#form Bellman equation
		@inbounds for i in 1:size_state_space
			u, ind = findmax(utilities[i] .+ β .* current_utilities)
			optimal_choices[i] = ind
			max_utilities[i] = u
		end

		# Vector of errors
		absdiff .= abs.(max_utilities .- current_utilities)

		 # Maximum absolute error
		vferr = maximum(absdiff)

		#if (mod(vfit, 50) == 1)
		#	println("VF error = $vferr on VF iteration $vfit.")
		#end

		# exit if converged
		if (vferr < VFTOL)
			println("Number of iterations: $vfit")
			println("Error achieved: $vferr")
		    break 
		end

		# if not converged, then Vold <- V, and redo
		current_utilities .= max_utilities
	end

	return optimal_choices
end


function get_policy_function(
	γ, 
	β, 
	r,
	size_state_space, 
	z_0, 
	z_min,
	z_max,
	)
	
	λ = (β * (1+r)) ^ (1 / γ)

	# Determining state space
	if λ > 1
		z_state_space = range(z_0, z_max, length = size_state_space)
	else
		z_state_space = range(z_min, z_0, length = size_state_space)
	end
	
	println("Beginning optimization")

	absdiff = Vector{Float64}(undef, size_state_space)
	optimal_choices_init = Vector{Int64}(undef, size_state_space)
	max_utilities = Vector{Float64}(undef, size_state_space)

	current_utilities = zeros(size_state_space)

	utilities = [fill(-Inf, size_state_space) for i in 1:size_state_space]

	for (i, z_t) in enumerate(z_state_space)
		for (j, z_t1) in enumerate(z_state_space)
			if _satisfies_budget_constraint(z_t, z_t1, r)
				utilities[i][j] = _utility(z_t, z_t1, γ, r)
			end
		end
	end

	optimal_choices = perform_loop(
		size_state_space, 
		utilities, 
		optimal_choices_init, 
		current_utilities, 
		max_utilities, 
		absdiff, 
		β)
	
	println("Completed optimization")

	function findnearest(A,t) 
		return findmin( abs.(A .- t))[2]
	end

	function policy_function(z)
		ind = findnearest(z_state_space, z)
		return z_state_space[optimal_choices[ind]]
	end

	function utility_function(z)
		return _utility(z, policy_function(z), γ, r)
	end

	function analytic_policy_function(z)
		λ * z
	end

	function analytic_utility_function(z)
		return _utility(z, analytic_policy_function(z), γ, r)
	end

	return z_state_space, policy_function, utility_function, analytic_policy_function, analytic_utility_function
end

function wrapper(
	γ, 
	β, 
	r,
	size_state_space, 
	z_0, 
	z_min,
	z_max)

	@time state_space, π, V, π_analytic, V_analytic = get_policy_function(
		γ, 
		β, 
		r,
		size_state_space, 
		z_0, 
		z_min,
		z_max)
end

function main()
	# Definition of parameter values
	γ = 1.4
	β = 0.9
	r_low = .02
	r_high = .2
	λ = (β * (1 + r_low)) ^ (1 / γ)
	# Time span 
	T = 100
	
	# Determining state space
	size_state_space = 1000
	z_0 = 1
	z_min = .001
	z_max = 100

	wrapper(
		γ, 
		β, 
		r_low,
		size_state_space, 
		z_0, 
		z_min,
		z_max)
end


end #module

One important question in cases like this is whether you are actually performing the exact same calculation in both cases. Subtle errors or differences in the code might be leading to different number of iterations before convergence, in which case the performance of the language might not be as important as the number of iters. Have you verified that your Julia code produces the same answer as your old code in the same number of iterations?

I can confirm that I have the same results, same parameter values, tolerance and max iterations.

The matlab code is vectorized, however, in a way that was unintuitive to write in Julia. Rather than iterate through the size of the state space, it performs vectorized operation as follows:

    % Makes a big matrix, adding current utilities to each row
    mat_to_find_max = utility + beta*repmat(current_utilities', size_state_space, 1);
    % Finds the minimum along the second dimension
    [max_utilities, optimal_choices] = max(mat_to_find_max, [], 2);

I assumed that because julia has fast for loops, this method would be just as fast as using vectorized operation.

Edit: and as far as I can tell, I’m right. the vectorized operation in julia makes things quite slow

I wonder if this part of the issue. You’re constructing a new vector to hold the result of utilities[i] .+ beta .* current_utilities for every iteration i, but that vector is just thrown away once you find its max element. That’s rather wasteful, since you’re repeatedly allocating and then destroying those vectors. Instead, you could write out a loop to find the maximum element by indexing directly into utilities[i] and current_utilities without ever computing the vector utilities[i] .+ β .* current_utilities.

For example, given vectors x and y, we can find the maximum of their sum without actually computing x .+ y. This is easier if we define a helper function:

julia> """
       Find the maximum index and maximum value of f.(x)
       """
       function mapped_max(f, x)
         best_index = firstindex(x)
         best_value = f(x[best_index])
         @inbounds for i in (firstindex(x) + 1):lastindex(x)
           y = f(x[i])
           if y > best_value
             best_value = y
             best_index = i
           end
         end
         (best_index, best_value)
       end
mapped_max

julia> x = [.1, .2, .3]; y = [.1, .2, -.1]
3-element Array{Float64,1}:
  0.1
  0.2
 -0.1

julia> mapped_max(i -> (x[i] + y[i]), 1:3)
(2, 0.4)

Or use the allready allocated absdiff to store the vector

absdiff .= utilities[i] .+ β .* current_utilities
findmax(absdiff)
1 Like

Also, the layout of a Vector{Vector{Float64}} in memory necessarily is not the same as a Matrix{Float64}, which is one reason the matrix approach might have some potential benefits w.r.t. cache access.

This does it. Performance goes from .9 seconds to .3 seconds with this change. Thanks for the help!

I also took the beta .* current_utilities out of the inner loop and allocated it so i wasn’t re-calculating every inner loop.

It would be nice if findmax had a built in lazy method. A findmax that took in a zipped iterator and had a do method would also have done the trick I think.

Do you