# How to speed up my code

Hi there,

I am solving a dynamic programming problem (first try in Julia). Previously I solved the same problem in Python. I was expecting Julia solving it much faster than Python, and I don’t get that much improvement. I would like to know if this is normal or otherwise, what I am doing wrong. Below you have the function `bellman` which I am interested to speed up (It is partially taken from QuantEcon lectures notes).

``````mutable struct AgentStruct{T <: Float64}
r::T
R::T
y::T
β::T
b::T
b_lim::T
tau::T
theta::T
pi::T
landa::T
X::Matrix{T}
z_vals::Vector{T}
a_vals::Array{Float64,1}
u::Function
end

function ConsumerProblem(;r=0.00,
y=1.0,
β=0.995,
X=[0.5 0.5; 0.0319 0.9681],
z_vals=[0.0, 1.0],
b=0.0,
b_lim=0.0,
tau=0.02,
theta=0.15,
pi=0.2,
landa=0.0,
grid_max=8,
grid_size=100)

R = 1.0 + r
a_vals = collect(range(-b_lim, stop=grid_max, length=grid_size))
u(c::Float64, h::Float64)::Float64 = log(c) + log(1.0-h)

return AgentStruct(r, R, y, β, b, b_lim, tau, theta, pi, landa, X, z_vals, a_vals, u)
end

function bellman(cons::AgentStruct,
V::Array{Float64, 2};
ret_policy::Bool=false)

"""
The approximate Bellman operator, which computes and returns the
updated value function TV (or the V-greedy policy c if
return_policy is True).
Inputs
----------
cons : AgentStruct
An instance of ConsumerProblem that stores primitives
V : Matrix(Float64)
A Matrix with dimension
return_policy : bool, optional(default=False)
Indicates whether to return the greed policy given V or the
updated value function TV.  Default is TV.
Output
-------
TV, g_a, g_d : Array(Float64)
Returns either the greed policy given V or the updated value
function TV.
"""
# Unpack parameters and grid
X, β, y, b, b_lim, tau, theta, pi, u = cons.X, cons.β, cons.y, cons.b, cons.b_lim, cons.tau, cons.theta, cons.pi, cons.u
a_vals, z_vals = cons.a_vals, cons.z_vals

# Initialize intermediate value functions and operator
TV = zeros(size(V))
V_u = ones(length(a_vals))*(-1e9)
V_e = ones(length(a_vals))*(-1e9)
V_f = ones(length(a_vals))*(-1e9)
# Initialize policy functions
g_a = Array{Int64, 2}(undef, length(a_vals), 3)
g_d = Array{Int64, 1}(undef, length(a_vals))

# Fix current a in each iteration and evaluate over the vector of possible a'
for (i_a, a) in enumerate(a_vals)
# Compute consumption for each possible employment status
c_e::Array{Float64,1} = max.(a + (1.0-tau)*(y + b) .- a_vals, 0.0)
c_u::Array{Float64,1} = max.(a + (1.0-tau)*(theta*y + b) .- a_vals, 0.0)
c_f::Array{Float64,1} = max.(a + (1.0-tau)*b .- a_vals, 0.0)

# Get maximums and maximizers for each possible employment status
V_u[i_a], g_a[i_a,1] = findmax(u.(c_u,0.0) + β*(X[1,2]*V[:,2] + X[1,1]*V[:,1]))
V_e[i_a], g_a[i_a,2] = findmax(u.(c_e,0.45) + β*(X[2,2]*V[:,2] + X[2,1]*V[:,1]))
V_f[i_a], g_a[i_a,3] = findmax(u.(c_f,0.0) + β*(X[2,2]*V[:,2] + X[2,1]*V[:,1]))
end

# Update the operator
TV[:,1] = copy(V_u)
TV[:,2] = max.(V_e, pi*V_u+ (1.0-pi)*V_f)

# Save the decision rule of accepting the job offer
g_d[:] = V_e .> pi*V_u + (1.0-pi)*V_f

if ret_policy
return TV, g_a, g_d
else
return TV
end
end
``````

Testing it:

``````V_0 = ones(length(cons.a_vals), length(cons.z_vals))*(-120.0)

using BenchmarkTools

@btime V, g_a, g_d = bellman(cons, V_0, ret_policy=true)

9.679 ms (363922 allocations: 7.85 MiB)
12.801 ms (363922 allocations: 7.85 MiB)
13.033 ms (363922 allocations: 7.85 MiB)
``````

In Python I am getting around 15.5ms-16.3ms

A few quick pointers:

• Use `fill(-1e9, length(a_vals))` instead of `ones(…)*(-1e9)`
• You probably shouldn’t need `::Array{Float64,1}` if your `AgentStruct` is defined with fully-typed fields
• Storing the function `u` inside your `AgentStruct` is somewhat of an anti-pattern in Julia. It’s generally much better to find ways to use external- (and multiple-) dispatch to solve your problem. But as a quick fix, instead of `u::Function`, use a type parameter instead, so that you have an `AgentStruct{T <: Float64, F <: Function}` and then `u::F`. Edit: I just realized what I copy-pasted… on that note, you don’t need to parameterize `T` if it’s just always `T <: Float64`.
• Sometimes returning a tuple and sometimes not based upon a boolean flag is a type instability that you’ll want to avoid.
• You don’t need that `copy`

More thorough tips can be found at https://docs.julialang.org/en/stable/manual/performance-tips/

8 Likes

Welcome!

A couple of points, in addition to what Matt posted above:

• `Float64` is a concrete type, so the only type `T` for which `T <: Float64` is…Float64 itself (edit: also `Union{}` but that’s not something you need to worry about here). So your `T <: Float64` in your struct definition, while not harmful, is also not doing anything useful. I suspect something like `T <: Real` is closer to what you would actually want.
• The `@code_warntype` tool, mentioned in the https://docs.julialang.org/en/stable/manual/performance-tips/ is very useful for the first steps of optimizing your code. Have you run it on your function and checked its results?
• In `max.(a + (1.0-tau)*(y + b) .- a_vals, 0.0)` you’re mixing broadcasted functions (with `.`) with non-broadcasted functions (without `.`). That’s fine, but it’s probably sub-optimal, as it will result in extra temporary arrays being allocated. You can `.` everything or use the `@.` macro to automate it for a given expression. See https://julialang.org/blog/2017/01/moredots for a great writeup of why this is useful
• You are allocating new arrays `c_e` etc. in every iteration of your loop. Instead, you can create those arrays once outside of the loop and then overwrite them:
``````...
c_e = Array{Float64, 1}(undef, whatever_the_size_is)
for (i_a, a) in enumerate(a_vals)
c_e .= max.(....) # note the use of `.=` to store the result in the *existing* c_e rather than make a new array
...
end
``````
6 Likes

Your code is pretty vectorized since you evaluate the vector of possible `a'` all at once, instead of using an explicit loop. If your Python code is similar it’s not obvious Python would be that inefficient, depending on how much time is spent in the vectorized operation.

I think big gains might come by using an explicit loop and not evaluating irrelevant possible `a'`. See the algorithm used here https://www.sas.upenn.edu/~jesusfv/comparison_languages.pdf and the part where they discuss using monotonicity and the envelope condition.

You should try what mbauman said and use `AgentStruct{T <: Float64, F <: Function}` and then `u::F` .

1 Like

Thank you to everyone!

@mbauman It would be better to not store `u` inside `AgentStructure` but define a global `u` before running the functions?

@rdeits I’ve checked with `@code_warntype` but it is quite confusing. I will check it again more carefully since I will need to get use to it. I am applying now your tip about creating the arrays outside the loop (I didn’t know about `.=`)

@aaowens That’s right, my Python code is vectorized and very similar. I want to use monotonicity as a refinement later on, but I wanted to ask first at this stage.

``````  9.280 ms (362421 allocations: 7.56 MiB)
9.110 ms (362421 allocations: 7.56 MiB)
9.134 ms (362421 allocations: 7.56 MiB)
``````

Which is slightly better.

If it’s hard to analyze the code in your function, that might be a sign that your function is trying to do too many things. You may want to try splitting up the behavior into more small functions. That should make the overall behavior easier to understand and it means that you can test each piece in isolation. Functions calls are cheap in Julia (and small functions can be inlined by the compiler, making them essentially free).

3 Likes

Here is a slightly faster version.

``````using Parameters

function bellman(cons::AgentStruct,
V::Array{Float64, 2};
ret_policy::Bool=false)

# Unpack parameters and grid
@unpack X, β, y, b, b_lim, tau, theta, pi, u, a_vals, z_vals = cons

# Initialize intermediate value functions and operator
TV = zeros(size(V))
V_u = fill(-1e9, length(a_vals))
V_e = copy(V_u)
V_f = copy(V_u)
# Initialize policy functions
g_a = zeros(Int, length(a_vals), 3)
g_d = zeros(Int, length(a_vals))
c_e = similar(a_vals)
c_u = similar(a_vals)
c_f = similar(a_vals)

# Fix current a in each iteration and evaluate over the vector of possible a'
for (i_a, a) in enumerate(a_vals)
# Compute consumption for each possible employment status
c_e .= max.((a + (1.0-tau)*(y + b)) .- a_vals, 0.0)
c_u .= max.((a + (1.0-tau)*(theta*y + b)) .- a_vals, 0.0)
c_f .= max.((a + (1.0-tau)*b) .- a_vals, 0.0)

# Get maximums and maximizers for each possible employment status
@views @. c_u = u(c_u,0.0) + β*(X[1,2]*V[:,2] + X[1,1]*V[:,1])
@views @. c_e = u(c_e,0.45) + β*(X[2,2]*V[:,2] + X[2,1]*V[:,1])
@views @. c_f = u(c_f,0.0) + β*(X[2,2]*V[:,2] + X[2,1]*V[:,1])
V_u[i_a], g_a[i_a,1] = findmax(c_u)
V_e[i_a], g_a[i_a,2] = findmax(c_e)
V_f[i_a], g_a[i_a,3] = findmax(c_f)
end

# Update the operator
@. TV[:,1] = V_u
@. TV[:,2] = max(V_e, pi*V_u + (1.0-pi)*V_f)

# Save the decision rule of accepting the job offer
@. g_d[:] = V_e > pi*V_u + (1.0-pi)*V_f

if ret_policy
return TV, g_a, g_d
else
return TV
end
end
``````

Also interpolate the inputs with `\$` when using `@btime`, as in `@btime bellman(\$agent, \$V);`.

2 Likes

Thanks for those tips @mohamed82008! It is not only slightly faster, but much more elegant (`@unpack`).

1 Like

You will probably get an order of magnitude speedup when you take advantage of monotonicity and just look for the relevant crossing points.

1 Like