# Function control flow

Hello guys,

I am a very recent Julia programmer coming from matlab in search of performance. Right now, I am building an economics model which involves solving different problems for different agents in different circumstances. For that reason, and in order to contain the size of the code and improve readability, my strategy is to use a single function which takes the characteristics of the agent in question (his income, savings, etc), sets up his problem conditional on its characteristics, and solves it. In particular, I built a function which calculates consumption, which is used in all problems:

``````function budget(p,b::Float64,bp::Float64,Ļµ::Float64,ht::Float64,h::Float64,hp::Float64,
retired::Bool)
"""
b  : starting quantity of bonds
bp : chosen quantity of bonds for next period
iĻµ : position on the labor productivity grid
ht : size of house rented
h  : size of house at start of period
hp : size of house at end of period

retired     : agent is retired (==1)
"""
if retired == 1 && (h != hp || h == 0.0)
# retired renter. chooses to keep renting
# retired renter. chooses to own house
# retired homeowner. chooses to sell house and rent
# retired homeowner. chooses to sell house and buy another
@unpack w,q_b,p_h,Ī“_h,Ļ_h,Īŗ_h,Ļ = p
return Ļµ*w + b + p_h*h*(1-Ī“_h-Ļ_h-Īŗ_h) - q_b*bp - p_h*hp - Ļ*ht

elseif retired == 0 && (h != hp || h == 0.0)
# active renter. chooses to keep renting
# active renter. chooses to own house
# active homeowner. chooses to sell house and rent
# active homeowner. chooses to sell house and buy another
@unpack w,q_b,p_h,Ī“_h,Ļ_h,Īŗ_h,Ļ = p
return Ļµ*w + b + p_h*h*(1-Ī“_h-Ļ_h-Īŗ_h) - q_b*bp - p_h*hp - Ļ*ht

elseif retired == 1 && h == hp && h > 0.0
# retired homeowner. chooses to keep his house
@unpack w,q_b,p_h,Ī“_h,Ļ_h,Ļ = p
return Ļµ*w + b - p_h*h*(Ī“_h+Ļ_h) - q_b*bp

else
# active homeowner. chooses to keep his house
@unpack w,q_b,p_h,Ī“_h,Ļ_h,Ļ = p
return Ļµ*w + b - p_h*h*(Ī“_h+Ļ_h) - q_b*bp
end

end
``````

The logic of the code is very simple: if agent == characteristic X => consumption for agent type X. Now, compared to a case where I simply make a different function for every different type of agent, this is twice as slow. Also, notice I am passing the parameters through object p (from Parameters.jl), which is also very costly in terms of performance. I do this so I dont have to be setting 24 different parameters inside each function everytime I want to call it, thus avoiding bugs and miscalculations (which is also the reason I prefer to use the same budget function for every problem). So my question is: is what I am doing best practice in terms of performance? Is there a way to increase it?
Thanks

When you say ātwice as slowā are we talking nano seconds? Milliseconds?

If itās nanoseconds then thatās just the cost of doing conditionals. And removing the conditionals here would just probably move the cost of them up the parent function.

Something you could try, but I really have no clue if it will help or hurt is to do all 4 calculations then do the conditions after to decide which one to return. Since none of the calculates appear to be in danger of dividing by zero the CPU pipeline might be able to give a boostā¦or it might not.

Can you show how you measured this? There should be essentially zero cost for using a parameter struct as long as you avoid fields with abstract types.

For example:

``````julia> using Parameters

julia> @with_kw struct Params
x::Int = 1
y::Float64 = 2.5
end
Params

julia> function f(p::Params)
return p.x + p.y
end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> p = Params()
Params
x: Int64 1
y: Float64 2.5

julia> @btime f(\$p)
0.016 ns (0 allocations: 0 bytes)
3.5
``````

Compare with an equivalent function `f` which takes `x` and `y` without the `Params` struct:

``````julia> function f(x, y)
return x + y
end
f (generic function with 2 methods)

julia> x = 1; y = 2.5
2.5

julia> @btime f(\$x, \$y)
0.017 ns (0 allocations: 0 bytes)
3.5
``````

There is no performance cost for the `Params` struct at all. In fact, in both cases the compiler is able to figure out that the answer is 3.5 without actually doing any work at run-time (thatās why the time reported is less than a single clock cycle).

1 Like

So here we are talking about nanoseconds, but since this function is called millions of times (multiple times for every agent type, for which I have to solve a dynamic programming problem), it pushes the total running time of the code from about 20 seconds to a few minutes. This may seem like a small issue but it will get worse as I add complexity to the problems Iām solving.

I will try your suggestion of calculating everything and then apply the conditions.

1 Like

I declared all the parameter types in the parameter structure (they are either Float64 or Integers). I did not declare the f(p::Params) like you did, but using @code_warntype, I saw that that created no type instability. Iāll declare it anyway, just to make sure. The way I measured it was by running the whole code for the two different ways of inputing parameters. Iāll measure it like you did, to see what I get. Thanks for the comment.

`Integer` is an abstract type in Julia (representing any integer type). `Int` is a concrete type.

Note that, while you do need to be sure that fields of data structures are concrete, you donāt need to declare `b::Float64,bp::Float64` etcetera for function arguments. See e.g. this discussion. One of the strengths of Julia is that you can write type-generic code that is still fast, e.g. your function should be able to work with either single or double precision depending on what argument types were passed. (Data structures can be made type-generic without sacrificing performance by using parameterized types.)

1 Like

In general, you will get much better help when looking for performance improvements if you can post a complete working example. It doesnāt have to be your actual data; just the bare minimum to run your function. If we can run the code ourselves, then itās much easier for us to help improve it.

3 Likes

Nanoseconds multiplied with āmillionsā should take milliseconds, not minutes, so thereās definitely something wrong.

Julia code can most of the time run as fast as optimized C code, so if you can find some way of sharing a runnable example, Iām sure you could get help speeding it up dramatically.

1 Like

Thanks for the advice. Iāll try to do that.

Here is the function which sets up the problem for a particular agent:

``````Using Dierckx
function ret_rent_own(p,b::Float64,bp::Float64,iĻµ::Int64,h::Float64,hp::Float64,vf::Array{Float64})
@unpack infm, supm, Ļ, Ī³, Ļ, Ī², Ļ, Ļµ_chain, b_grid = p
Ļµ_grid = Ļµ_chain.state_values
c = budget(p,b,bp,Ļµ_grid[iĻµ],0.0,h,hp,true)

if c < infm
v=-supm
else
vitp = Spline1D(b_grid,vf,k=3,bc="extrapolate")
v=c+ Ļ*hp + Ī²*vitp(bp)
end
return v
end
``````

Then here is a simplified version of the function that solves the agentās problem:

``````function solve_ret_last_rent(p,ib::Int64,iĻµ::Int64)
"""
Function to calculate the asset choice for individuals who begin the last period
of their lives as renters
"""
@unpack Ļµ_chain, b_grid, b_min, q_b, H_num, H_grid, Ht_num, Ht_grid, tol, itermax = p
Ļµ_grid = Ļµ_chain.state_values

# storage for renting and homeownership
v00 = zeros(H_num); b00 = similar(v00)

for i in eachindex(H_grid) # loop over house choices
b1=b_min; b2=max(b_min+0.01,budget(p,b_grid[ib],0.0,Ļµ_grid[iĻµ],0.0,0.0,H_grid[i],true)/q_b)
a,b = gssearch(bp -> ret_last_rent_own(p,b_grid[ib],bp,iĻµ,0.0,H_grid[i]),b1,b2,tol=tol,maxit=itermax)
b00[i]=copy(a); v00[i]=copy(b)
end
vo,ho = findmax(v00) # choose from house size options
bo = b00[ho]          # save bond choice
co = budget(p,b_grid[ib],bo,Ļµ_grid[iĻµ],0.0,0.0,H_grid[ho],true)

return bo, vo, co, ho
end
``````

And here is the function that, for a given set of parameters solves the agentās problem for all points in the state space (so, for any income and wealth combination:

``````function solv_ret(p)
# p is the parameter structure
@unpack q_b, w, Jret, J, Ī², Ļ, H_num, nĻµ, nb, b_grid, Ļµ_chain, b_min, infm = p
Ļµ_grid = Ļµ_chain.state_values   # idiosyncratic prod states
Ī  = Ļµ_chain.p                   # Stochastic matrix
nr = J - Jret + 1               # number of periods in retirement

# Renters - array pre-allocation
vn_ret   = zeros(nĻµ,nb,nr)           # value function
cnf_ret  = similar(vn_ret)           # consumption function
bnf_ret  = similar(vn_ret)           # asset function
hnf_ret  = similar(vn_ret)           # housing decision

# last year of life
for iĻµ = 1:nĻµ
# renter
bn,vn,cn,hn,gn = solve_ret_last_rent(p,ib,iĻµ)
vn_ret[iĻµ,ib,end]   = copy(vn)    # value function
cnf_ret[iĻµ,ib,end]  = copy(cn)    # consumption function
bnf_ret[iĻµ,ib,end]  = copy(bn)    # bond function
hnf_ret[iĻµ,ib,end]  = copy(hn)    # housing decision
end
end
end
``````

The rest of the model simply iterates on this procedure until an equilibrium is found, but this type of function is where the bulk of time is spent.

The parameter structure incluces the default parameters values:

``````Using QuantEcon
@with_kw struct HH
# Prices
q_b::Float64                = 0.97         # price of bonds
w::Float64                  = 0.956        # wage rate
Ļ::Float64                  = 0.1          # rent
p_h::Float64                = 0.5          # house price

# Households
Jret::Int64                 = 23             # retirement age
J::Int64                    = 30             # last period of life
Ī²::Float64                  = 0.95           # discount factor
Ļ::Float64                  = 0.12           # relative taste for housing
Ī³::Float64                  = 0.8            # ES between consumption and housing
Ļ::Float64                  = 2.0            # risk aversion coefficient
Ļ::Float64                  = 1.0            # utility premium from owning a house
Ļ_Ļµ::Float64                = 0.3            # idiosyncratic productivity persistence
Ī¼::Float64                  = 1-Ļ_Ļµ          # idiosyncratic productivity constant
Ļ_Ļµ::Float64                = 0.1            # standard deviation of random component
nĻµ::Int64                   = 3              # number of productivity states
n_std::Int64                = 2              # max number of standard deviations
Ļµ_chain::MarkovChain{Float64,Array{Float64,2},StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}} = tauchen(nĻµ,Ļ_Ļµ,Ļ_Ļµ,Ī¼,n_std)              # transition matrix and z values
b_min::Float64              = 0.0001         # minimum of the asset grid (borrowing constraint)
b_max::Float64              = 8.0            # maximum of the asset grid
nb::Int64                   = 24             # number of points in the asset grid
b_grid::Array{Float64,1}    = make_nlgrid(nb,b_min,b_max,4.0) # asset grid. Convexity is the last element

# Housing market
H_min::Float64              = 0.5           # minimum house size
H_num::Int64                = 2             # number of house sizes
H_gap::Float64              = 1.0           # gap between house sizes
H_max::Float64              = 3.15          # maximum house size
H_grid::Array{Float64,1}    = collect(range(H_min,stop=H_max,length=H_num))
Ī“_h::Float64                = 0.015         # house depreciation
Īŗ_h::Float64                = 0.07          # transaction cost

# Rental market
Ht_min::Float64             = 0.5           # minimum rental size
Ht_num::Int64               = 2             # number of rental sizes
Ht_gap::Float64             = 1.0           # gap between rental sizes
Ht_max::Float64             = 1.92          # maximum rental size
Ht_grid::Array{Float64,1}   = make_nlgrid(Ht_num,Ht_min,Ht_min+Ht_num*Ht_gap,1.0)
Ļ::Float64                  = 0.008         # operating costs

# Government
Ļ_h::Float64                = 0.01          # property tax

# Simulation parameters
supm::Float64               = 1.0e+10    # model infinity
infm::Float64               = 1.0e-10    # model zero
tol_eq::Float64             = 1.0e-3        # tolerance level for equilibrium computation
tol::Float64                = tol_eq*0.01   # tolerance level for the golden search (must be lower than the tolerance for the equilibrium)
itermax::Int64              = 1000          # max number of iterations allowed
end
``````

And the algorithm Iām using to do the maximization (golden section search):

``````function gssearch(f::Function, P1::Float64, P4::Float64; tol::Float64=10*eps(),maxit::Int64=1000)
iter = 1
d = 1.0

V2 = 1.0; P2 = 1.0; # initialize variables to be exported

while d > tol
P2 = P1 + ((3.0-sqrt(5.0))/2.0)*(P4-P1)
P3 = P1 + ((sqrt(5.0)-1.0)/2.0)*(P4-P1)

V2 = f(P2)::Float64
V3 = f(P3)::Float64

if V2 < V3
P1 = copy(P2) # update lower bound
else
P4 = copy(P3) # update upper bound
end
d = P4-P1
iter = iter + 1
iter < maxit || error("golden search evals exceeded")
#println(d)
end

return P2, V2
end
``````

I was imprecise in my statement. I meant upwards of hundreds of millions. Given that a maximization procedure is involved, it is uncertain how many times it is actually called, as it depends on the number of iterations until convergence.

Be careful with constant folding optimisations, or in general when you see subnanosecond result in benchmarks. You can use a ref-deref-trick to avoid that:

``````julia> @btime f(\$(Ref(p))[])
1.965 ns (0 allocations: 0 bytes)
3.5

julia> @btime f(\$(Ref(x))[], \$(Ref(y))[])
1.965 ns (0 allocations: 0 bytes)
3.5
``````
2 Likes

(removed double post due to network issues)

Yup, good advice. However, the fact that even the constant-folding works through the `Params` struct kind of emphasizes the point I was trying to make

1 Like

Iāve managed to reduce the total computing time of the model from around 28 seconds to 6.4 seconds by using the Optim package, instead of my user-made Golden Search algorithm.

1 Like

This shouldnt be costly if the parameter type is stable? The other stuff will certainly be slower,

Always best to use other peopleās code. Also, splines are a common place where things can slowdown. For example, being able to reuse basic matrices and their factorizations, etc

I made sure all the parameters had concrete types, and checked every function using @code_warntype. Its possible that I wrongly inferred that using the parameters structure might be decreasing performance. I basically tried two versions of the code, where in one I just passed the parameters directly to the function and timed both. It could be the case that I made some other change which decreased performance.

Maybe I could also reduce the precision of the Integers that I am using. They are all Int64 by default, and I donāt need large integers for these operations. That would reduce memory allocation and could potentially increase speed right?

Changing integer precision shouldnāt normally be necessary, and is quite a marginal optimization (on cpus at least), unless you have identified arrays of ints as a major memory issue. Maybe I missed it, but did you actually profile your code to find out which parts are slow?

1 Like

I didnāt post the profiling results, but the biggest drags on performance are the optimization part and the splines (as pointed out by a previous poster). But its good to know that changing precision will only change performance marginally.

I revisited this issue of whether Params has a performance cost. Iāve followed what you did in your exercise, but closer to what is actually done in my code:

``````using Parameters, BenchmarkTools

@with_kw struct Params2
w::Float64 = .956
end

pp = Params2()

function wage1(pp::Params2)
pp.w*2.0
end

@with_kw struct Prices
# Prices
q_b::Float64                = 0.97         # price of bonds
w::Float64                  = 0.956        # wage rate
Ļ::Float64                  = 0.1          # rent
p_h::Float64                = 0.5          # house price
q::Float64                  = 1.0          # Price of mortgage
Ī¹::Float64                  = 0.01         # Intermediation wedge
r_b::Float64                = 1.0/q_b-1.0  # risk-free interest rate
r_m::Float64                = r_b*(1.0+Ī¹)  # common real interest rate
end

pr = Prices()

function wage2(pr::Prices)
pr.w*2.0
end

function wage3(w::Real)
w*2.0
end

@btime wage1(pp)       # small parameter structure
@btime wage2(pr)        # large parameter structure
@btime wage3(.956)    # no parameter structure

4.499 ns (0 allocations: 0 bytes)
6.899 ns (1 allocation: 16 bytes)
0.001 ns (0 allocations: 0 bytes)
``````

So the last case is where I pass w to the function directly. It seems to be the case that performance costs increase when the parameter structure is larger. What do you think?