Function control flow

Hello guys,

I am a very recent Julia programmer coming from matlab in search of performance. Right now, I am building an economics model which involves solving different problems for different agents in different circumstances. For that reason, and in order to contain the size of the code and improve readability, my strategy is to use a single function which takes the characteristics of the agent in question (his income, savings, etc), sets up his problem conditional on its characteristics, and solves it. In particular, I built a function which calculates consumption, which is used in all problems:

function budget(p,b::Float64,bp::Float64,Ļµ::Float64,ht::Float64,h::Float64,hp::Float64,
                retired::Bool)
    """
    b  : starting quantity of bonds
    bp : chosen quantity of bonds for next period
    iĻµ : position on the labor productivity grid
    ht : size of house rented
    h  : size of house at start of period
    hp : size of house at end of period

    retired     : agent is retired (==1)
    """
    if retired == 1 && (h != hp || h == 0.0)
        # retired renter. chooses to keep renting
        # retired renter. chooses to own house
        # retired homeowner. chooses to sell house and rent
        # retired homeowner. chooses to sell house and buy another
        @unpack w,q_b,p_h,Ī“_h,Ļ„_h,Īŗ_h,Ļ = p
        return Ļµ*w + b + p_h*h*(1-Ī“_h-Ļ„_h-Īŗ_h) - q_b*bp - p_h*hp - Ļ*ht

    elseif retired == 0 && (h != hp || h == 0.0)
        # active renter. chooses to keep renting
        # active renter. chooses to own house
        # active homeowner. chooses to sell house and rent
        # active homeowner. chooses to sell house and buy another
        @unpack w,q_b,p_h,Ī“_h,Ļ„_h,Īŗ_h,Ļ = p
        return Ļµ*w + b + p_h*h*(1-Ī“_h-Ļ„_h-Īŗ_h) - q_b*bp - p_h*hp - Ļ*ht

    elseif retired == 1 && h == hp && h > 0.0
        # retired homeowner. chooses to keep his house
        @unpack w,q_b,p_h,Ī“_h,Ļ„_h,Ļ = p
        return Ļµ*w + b - p_h*h*(Ī“_h+Ļ„_h) - q_b*bp

    else 
        # active homeowner. chooses to keep his house
        @unpack w,q_b,p_h,Ī“_h,Ļ„_h,Ļ = p
        return Ļµ*w + b - p_h*h*(Ī“_h+Ļ„_h) - q_b*bp
    end

end

The logic of the code is very simple: if agent == characteristic X => consumption for agent type X. Now, compared to a case where I simply make a different function for every different type of agent, this is twice as slow. Also, notice I am passing the parameters through object p (from Parameters.jl), which is also very costly in terms of performance. I do this so I dont have to be setting 24 different parameters inside each function everytime I want to call it, thus avoiding bugs and miscalculations (which is also the reason I prefer to use the same budget function for every problem). So my question is: is what I am doing best practice in terms of performance? Is there a way to increase it?
Thanks

When you say ā€œtwice as slowā€ are we talking nano seconds? Milliseconds?

If itā€™s nanoseconds then thatā€™s just the cost of doing conditionals. And removing the conditionals here would just probably move the cost of them up the parent function.

Something you could try, but I really have no clue if it will help or hurt is to do all 4 calculations then do the conditions after to decide which one to return. Since none of the calculates appear to be in danger of dividing by zero the CPU pipeline might be able to give a boostā€¦or it might not.

Can you show how you measured this? There should be essentially zero cost for using a parameter struct as long as you avoid fields with abstract types.

For example:

julia> using Parameters

julia> @with_kw struct Params
         x::Int = 1
         y::Float64 = 2.5
       end
Params

julia> function f(p::Params)
         return p.x + p.y
       end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> p = Params()
Params
  x: Int64 1
  y: Float64 2.5


julia> @btime f($p)
  0.016 ns (0 allocations: 0 bytes)
3.5

Compare with an equivalent function f which takes x and y without the Params struct:

julia> function f(x, y)
         return x + y
       end
f (generic function with 2 methods)

julia> x = 1; y = 2.5
2.5

julia> @btime f($x, $y)
  0.017 ns (0 allocations: 0 bytes)
3.5

There is no performance cost for the Params struct at all. In fact, in both cases the compiler is able to figure out that the answer is 3.5 without actually doing any work at run-time (thatā€™s why the time reported is less than a single clock cycle).

1 Like

Thanks for your comment.
So here we are talking about nanoseconds, but since this function is called millions of times (multiple times for every agent type, for which I have to solve a dynamic programming problem), it pushes the total running time of the code from about 20 seconds to a few minutes. This may seem like a small issue but it will get worse as I add complexity to the problems Iā€™m solving.

I will try your suggestion of calculating everything and then apply the conditions.

1 Like

I declared all the parameter types in the parameter structure (they are either Float64 or Integers). I did not declare the f(p::Params) like you did, but using @code_warntype, I saw that that created no type instability. Iā€™ll declare it anyway, just to make sure. The way I measured it was by running the whole code for the two different ways of inputing parameters. Iā€™ll measure it like you did, to see what I get. Thanks for the comment.

Integer is an abstract type in Julia (representing any integer type). Int is a concrete type.

Note that, while you do need to be sure that fields of data structures are concrete, you donā€™t need to declare b::Float64,bp::Float64 etcetera for function arguments. See e.g. this discussion. One of the strengths of Julia is that you can write type-generic code that is still fast, e.g. your function should be able to work with either single or double precision depending on what argument types were passed. (Data structures can be made type-generic without sacrificing performance by using parameterized types.)

1 Like

In general, you will get much better help when looking for performance improvements if you can post a complete working example. It doesnā€™t have to be your actual data; just the bare minimum to run your function. If we can run the code ourselves, then itā€™s much easier for us to help improve it.

3 Likes

Nanoseconds multiplied with ā€œmillionsā€ should take milliseconds, not minutes, so thereā€™s definitely something wrong.

Julia code can most of the time run as fast as optimized C code, so if you can find some way of sharing a runnable example, Iā€™m sure you could get help speeding it up dramatically.

1 Like

Thanks for the advice. Iā€™ll try to do that.

Here is the function which sets up the problem for a particular agent:

Using Dierckx
function ret_rent_own(p,b::Float64,bp::Float64,iĻµ::Int64,h::Float64,hp::Float64,vf::Array{Float64})
    @unpack infm, supm, Ļ•, Ī³, Ļ‘, Ī², Ļ‰, Ļµ_chain, b_grid = p
    Ļµ_grid = Ļµ_chain.state_values
    c = budget(p,b,bp,Ļµ_grid[iĻµ],0.0,h,hp,true)

    if c < infm
        v=-supm
    else
        vitp = Spline1D(b_grid,vf,k=3,bc="extrapolate")
        v=c+ Ļ‰*hp + Ī²*vitp(bp)
    end
    return v
end

Then here is a simplified version of the function that solves the agentā€™s problem:

function solve_ret_last_rent(p,ib::Int64,iĻµ::Int64)
    """
    Function to calculate the asset choice for individuals who begin the last period
    of their lives as renters
    """
    @unpack Ļµ_chain, b_grid, b_min, q_b, H_num, H_grid, Ht_num, Ht_grid, tol, itermax = p
    Ļµ_grid = Ļµ_chain.state_values

    # storage for renting and homeownership
    v00 = zeros(H_num); b00 = similar(v00)

    #---- Buy house
    for i in eachindex(H_grid) # loop over house choices
        b1=b_min; b2=max(b_min+0.01,budget(p,b_grid[ib],0.0,Ļµ_grid[iĻµ],0.0,0.0,H_grid[i],true)/q_b)
        a,b = gssearch(bp -> ret_last_rent_own(p,b_grid[ib],bp,iĻµ,0.0,H_grid[i]),b1,b2,tol=tol,maxit=itermax)
        b00[i]=copy(a); v00[i]=copy(b)
    end
    vo,ho = findmax(v00) # choose from house size options
    bo = b00[ho]          # save bond choice
    co = budget(p,b_grid[ib],bo,Ļµ_grid[iĻµ],0.0,0.0,H_grid[ho],true)

    return bo, vo, co, ho
end

And here is the function that, for a given set of parameters solves the agentā€™s problem for all points in the state space (so, for any income and wealth combination:

function solv_ret(p)
    # p is the parameter structure
    @unpack q_b, w, Jret, J, Ī², Ļ‘, H_num, nĻµ, nb, b_grid, Ļµ_chain, b_min, infm = p
    Ļµ_grid = Ļµ_chain.state_values   # idiosyncratic prod states
    Ī  = Ļµ_chain.p                   # Stochastic matrix
    nr = J - Jret + 1               # number of periods in retirement

    # Renters - array pre-allocation
    vn_ret   = zeros(nĻµ,nb,nr)           # value function
    cnf_ret  = similar(vn_ret)           # consumption function
    bnf_ret  = similar(vn_ret)           # asset function
    hnf_ret  = similar(vn_ret)           # housing decision

    # last year of life
    Threads.@threads for ib = 1:nb
        for iĻµ = 1:nĻµ
            # renter
            bn,vn,cn,hn,gn = solve_ret_last_rent(p,ib,iĻµ)
            vn_ret[iĻµ,ib,end]   = copy(vn)    # value function
            cnf_ret[iĻµ,ib,end]  = copy(cn)    # consumption function
            bnf_ret[iĻµ,ib,end]  = copy(bn)    # bond function
            hnf_ret[iĻµ,ib,end]  = copy(hn)    # housing decision
        end
    end
end

The rest of the model simply iterates on this procedure until an equilibrium is found, but this type of function is where the bulk of time is spent.

The parameter structure incluces the default parameters values:

Using QuantEcon
@with_kw struct HH
    # Prices
    q_b::Float64                = 0.97         # price of bonds
    w::Float64                  = 0.956        # wage rate
    Ļ::Float64                  = 0.1          # rent
    p_h::Float64                = 0.5          # house price

    # Households
    Jret::Int64                 = 23             # retirement age
    J::Int64                    = 30             # last period of life
    Ī²::Float64                  = 0.95           # discount factor
    Ļ•::Float64                  = 0.12           # relative taste for housing
    Ī³::Float64                  = 0.8            # ES between consumption and housing
    Ļ‘::Float64                  = 2.0            # risk aversion coefficient
    Ļ‰::Float64                  = 1.0            # utility premium from owning a house
    Ļ_Ļµ::Float64                = 0.3            # idiosyncratic productivity persistence
    Ī¼::Float64                  = 1-Ļ_Ļµ          # idiosyncratic productivity constant
    Ļƒ_Ļµ::Float64                = 0.1            # standard deviation of random component
    nĻµ::Int64                   = 3              # number of productivity states
    n_std::Int64                = 2              # max number of standard deviations
    Ļµ_chain::MarkovChain{Float64,Array{Float64,2},StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}} = tauchen(nĻµ,Ļ_Ļµ,Ļƒ_Ļµ,Ī¼,n_std)              # transition matrix and z values
    b_min::Float64              = 0.0001         # minimum of the asset grid (borrowing constraint)
    b_max::Float64              = 8.0            # maximum of the asset grid
    nb::Int64                   = 24             # number of points in the asset grid
    b_grid::Array{Float64,1}    = make_nlgrid(nb,b_min,b_max,4.0) # asset grid. Convexity is the last element

    # Housing market
    H_min::Float64              = 0.5           # minimum house size
    H_num::Int64                = 2             # number of house sizes
    H_gap::Float64              = 1.0           # gap between house sizes
    H_max::Float64              = 3.15          # maximum house size
    H_grid::Array{Float64,1}    = collect(range(H_min,stop=H_max,length=H_num))
    Ī“_h::Float64                = 0.015         # house depreciation
    Īŗ_h::Float64                = 0.07          # transaction cost

    # Rental market
    Ht_min::Float64             = 0.5           # minimum rental size
    Ht_num::Int64               = 2             # number of rental sizes
    Ht_gap::Float64             = 1.0           # gap between rental sizes
    Ht_max::Float64             = 1.92          # maximum rental size
    Ht_grid::Array{Float64,1}   = make_nlgrid(Ht_num,Ht_min,Ht_min+Ht_num*Ht_gap,1.0)
    Ļˆ::Float64                  = 0.008         # operating costs

    # Government
    Ļ„_h::Float64                = 0.01          # property tax

    # Simulation parameters
    supm::Float64               = 1.0e+10    # model infinity
    infm::Float64               = 1.0e-10    # model zero
    tol_eq::Float64             = 1.0e-3        # tolerance level for equilibrium computation
    tol::Float64                = tol_eq*0.01   # tolerance level for the golden search (must be lower than the tolerance for the equilibrium)
    itermax::Int64              = 1000          # max number of iterations allowed
end

And the algorithm Iā€™m using to do the maximization (golden section search):

function gssearch(f::Function, P1::Float64, P4::Float64; tol::Float64=10*eps(),maxit::Int64=1000)
    iter = 1
    d = 1.0

    V2 = 1.0; P2 = 1.0; # initialize variables to be exported

    while d > tol
        P2 = P1 + ((3.0-sqrt(5.0))/2.0)*(P4-P1)
        P3 = P1 + ((sqrt(5.0)-1.0)/2.0)*(P4-P1)

        V2 = f(P2)::Float64
        V3 = f(P3)::Float64

        if V2 < V3
            P1 = copy(P2) # update lower bound
        else
            P4 = copy(P3) # update upper bound
        end
        d = P4-P1
        iter = iter + 1
        iter < maxit || error("golden search evals exceeded")
        #println(d)
    end

    return P2, V2
end

I was imprecise in my statement. I meant upwards of hundreds of millions. Given that a maximization procedure is involved, it is uncertain how many times it is actually called, as it depends on the number of iterations until convergence.

Be careful with constant folding optimisations, or in general when you see subnanosecond result in benchmarks. You can use a ref-deref-trick to avoid that:

julia> @btime f($(Ref(p))[])
  1.965 ns (0 allocations: 0 bytes)
3.5

julia> @btime f($(Ref(x))[], $(Ref(y))[])
  1.965 ns (0 allocations: 0 bytes)
3.5
2 Likes

(removed double post due to network issues)

Yup, good advice. However, the fact that even the constant-folding works through the Params struct kind of emphasizes the point I was trying to make :wink:

1 Like

Iā€™ve managed to reduce the total computing time of the model from around 28 seconds to 6.4 seconds by using the Optim package, instead of my user-made Golden Search algorithm.

1 Like

This shouldnt be costly if the parameter type is stable? The other stuff will certainly be slower,

Always best to use other peopleā€™s code. Also, splines are a common place where things can slowdown. For example, being able to reuse basic matrices and their factorizations, etc

I made sure all the parameters had concrete types, and checked every function using @code_warntype. Its possible that I wrongly inferred that using the parameters structure might be decreasing performance. I basically tried two versions of the code, where in one I just passed the parameters directly to the function and timed both. It could be the case that I made some other change which decreased performance.

Maybe I could also reduce the precision of the Integers that I am using. They are all Int64 by default, and I donā€™t need large integers for these operations. That would reduce memory allocation and could potentially increase speed right?

Changing integer precision shouldnā€™t normally be necessary, and is quite a marginal optimization (on cpus at least), unless you have identified arrays of ints as a major memory issue. Maybe I missed it, but did you actually profile your code to find out which parts are slow?

1 Like

I didnā€™t post the profiling results, but the biggest drags on performance are the optimization part and the splines (as pointed out by a previous poster). But its good to know that changing precision will only change performance marginally.

I revisited this issue of whether Params has a performance cost. Iā€™ve followed what you did in your exercise, but closer to what is actually done in my code:

using Parameters, BenchmarkTools

@with_kw struct Params2
         w::Float64 = .956
end

pp = Params2()

function wage1(pp::Params2)
    pp.w*2.0
end

@with_kw struct Prices
    # Prices
    q_b::Float64                = 0.97         # price of bonds
    w::Float64                  = 0.956        # wage rate
    Ļ::Float64                  = 0.1          # rent
    p_h::Float64                = 0.5          # house price
    q::Float64                  = 1.0          # Price of mortgage
    Ī¹::Float64                  = 0.01         # Intermediation wedge
    r_b::Float64                = 1.0/q_b-1.0  # risk-free interest rate
    r_m::Float64                = r_b*(1.0+Ī¹)  # common real interest rate
end

pr = Prices()

function wage2(pr::Prices)
    pr.w*2.0
end

function wage3(w::Real)
    w*2.0
end

@btime wage1(pp)       # small parameter structure
@btime wage2(pr)        # large parameter structure
@btime wage3(.956)    # no parameter structure

4.499 ns (0 allocations: 0 bytes)
6.899 ns (1 allocation: 16 bytes)
0.001 ns (0 allocations: 0 bytes)

So the last case is where I pass w to the function directly. It seems to be the case that performance costs increase when the parameter structure is larger. What do you think?