Optimize benchmark-code

Hey all, I found this benchmark comparing different languages in economics and I wonder whether the underlying Julia code can be optimized further (full repository is available here):

## Basic RBC model with full depreciation
#
# Jesus Fernandez-Villaverde
# Haverford, July 29, 2013

function main()

    ##  1. Calibration

    α = 1/3     # Elasticity of output w.r.t. capital
    β = 0.95    # Discount factor

    # Productivity values
    vProductivity = [0.9792 0.9896 1.0000 1.0106 1.0212]

    # Transition matrix
    mTransition   = [0.9727 0.0273 0.0000 0.0000 0.0000;
                     0.0041 0.9806 0.0153 0.0000 0.0000;
                     0.0000 0.0082 0.9837 0.0082 0.0000;
                     0.0000 0.0000 0.0153 0.9806 0.0041;
                     0.0000 0.0000 0.0000 0.0273 0.9727]

    # 2. Steady State

    capitalSteadyState = (α*β)^(1/(1-α))
    outputSteadyState = capitalSteadyState^α
    consumptionSteadyState = outputSteadyState-capitalSteadyState

    println("Output = ",outputSteadyState," Capital = ",capitalSteadyState," Consumption = ",consumptionSteadyState)

    # We generate the grid of capital
    vGridCapital = collect(0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState)

    nGridCapital = length(vGridCapital)
    nGridProductivity = length(vProductivity)

    # 3. Required matrices and vectors

    mOutput           = zeros(nGridCapital,nGridProductivity)
    mValueFunction    = zeros(nGridCapital,nGridProductivity)
    mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
    mPolicyFunction   = zeros(nGridCapital,nGridProductivity)
    expectedValueFunction = zeros(nGridCapital,nGridProductivity)

    # 4. We pre-build output for each point in the grid

    mOutput = (vGridCapital.^α)*vProductivity;

    # 5. Main iteration

    maxDifference = 10.0
    tolerance = 0.0000001
    iteration = 0

    while(maxDifference > tolerance)
        expectedValueFunction = mValueFunction*mTransition';

        for nProductivity in 1:nGridProductivity

            # We start from previous choice (monotonicity of policy function)
            gridCapitalNextPeriod = 1

            for nCapital in 1:nGridCapital

                valueHighSoFar = -1000.0
                capitalChoice  = vGridCapital[1]

                for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital

                    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
                    valueProvisional = (1-β)*log(consumption)+β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

                    if (valueProvisional>valueHighSoFar)
                	       valueHighSoFar = valueProvisional
                	       capitalChoice = vGridCapital[nCapitalNextPeriod]
                	       gridCapitalNextPeriod = nCapitalNextPeriod
                    else
                	       break # We break when we have achieved the max
                    end

                end

                mValueFunctionNew[nCapital,nProductivity] = valueHighSoFar
                mPolicyFunction[nCapital,nProductivity] = capitalChoice

            end

        end
        
        maxDifference     = maximum(abs.(mValueFunctionNew-mValueFunction))
        mValueFunction, mValueFunctionNew = mValueFunctionNew, mValueFunction
        
        iteration = iteration+1
        if mod(iteration,10)==0 || iteration == 1
            println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        end

    end

    println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
    println(" ")
    println(" My check = ", mPolicyFunction[1000,3])
    println(" My check = ", mValueFunction[1000,3])

end

Right now, Julia 1.0 is around 40%-50% slower than both Fortran and C++ (in the paper, as well as on my machine).

As far as I can see, the code is written properly in column-major-order and I found no obvious type-instabilities. I also tried Traceur and @code_warntype, but that didn’t get me more insights.

I have the highest optimization level enabled and removed some printing, so that Juno’s slow printing doesn’t mess up the results. I know that one could play around with some libraries for the matrix multiplication, but that will affect the results only marginally.

I would appreciate any tips!

Seems that this is mostly bottlenecked on the log call so there doesn’t seem to be anything super-simple that can be done to significantly speed things up (without being smart with how often log gets called).

2 Likes

But is the log actually slower than the log in C++ and Fortran? It get’s called around 60 Mio. times, so this could be indeed quite significant (the runtime itself is around 1.3s on my machine).

You could remove the log call and re-time julia and the rest.

This was posted here quite a while back. I think a few people already implemented some optimizations on the code, that’s why you don’t see any low-hanging ones left.

1 Like

Changing

to

maxDifference = maximum(abs.(mValueFunctionNew .- mValueFunction))

prevents one extra allocation in the outer loop and seems to help a tiny bit in my laptop.

1 Like

It looks like you could just create a log consumption array outside the loops entirely. That should help.

1 Like

This greatly reduce the number of allocations. In my laptop it is 18% faster than the original code.

using LinearAlgebra
function newmain()

    ##  1. Calibration

    α = 1/3     # Elasticity of output w.r.t. capital
    β = 0.95    # Discount factor

    # Productivity values
    vProductivity = [0.9792 0.9896 1.0000 1.0106 1.0212]

    # Transition matrix
    mTransition   = [0.9727 0.0273 0.0000 0.0000 0.0000;
                     0.0041 0.9806 0.0153 0.0000 0.0000;
                     0.0000 0.0082 0.9837 0.0082 0.0000;
                     0.0000 0.0000 0.0153 0.9806 0.0041;
                     0.0000 0.0000 0.0000 0.0273 0.9727]

    # 2. Steady State

    capitalSteadyState = (α*β)^(1/(1-α))
    outputSteadyState = capitalSteadyState^α
    consumptionSteadyState = outputSteadyState-capitalSteadyState

    println("Output = ",outputSteadyState," Capital = ",capitalSteadyState," Consumption = ",consumptionSteadyState)

    # We generate the grid of capital
    vGridCapital = collect(0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState)

    nGridCapital = length(vGridCapital)
    nGridProductivity = length(vProductivity)

    # 3. Required matrices and vectors

    mOutput           = zeros(nGridCapital,nGridProductivity)
    mValueFunction    = zeros(nGridCapital,nGridProductivity)
    mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
    mPolicyFunction   = zeros(nGridCapital,nGridProductivity)
    expectedValueFunction = zeros(nGridCapital,nGridProductivity)

    # 4. We pre-build output for each point in the grid

    mOutput = (vGridCapital.^α)*vProductivity;

    # 5. Main iteration

    maxDifference = 10.0
    tolerance = 0.0000001
    iteration = 0

    while(maxDifference > tolerance)
        mul!(expectedValueFunction, mValueFunction, mTransition');

        for nProductivity in 1:nGridProductivity

            # We start from previous choice (monotonicity of policy function)
            gridCapitalNextPeriod = 1

            for nCapital in 1:nGridCapital

                valueHighSoFar = -1000.0
                capitalChoice  = vGridCapital[1]

                for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital

                    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
                    valueProvisional = (1-β)*log(consumption)+β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

                    if (valueProvisional>valueHighSoFar)
                	       valueHighSoFar = valueProvisional
                	       capitalChoice = vGridCapital[nCapitalNextPeriod]
                	       gridCapitalNextPeriod = nCapitalNextPeriod
                    else
                	       break # We break when we have achieved the max
                    end

                end

                mValueFunctionNew[nCapital,nProductivity] = valueHighSoFar
                mPolicyFunction[nCapital,nProductivity] = capitalChoice

            end

        end
        
        maxDifference     = maximum(x->(abs(x[1]-x[2])), zip(mValueFunctionNew, mValueFunction))
        mValueFunction, mValueFunctionNew = mValueFunctionNew, mValueFunction
        
        iteration = iteration+1
        if mod(iteration,10)==0 || iteration == 1
            println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        end

    end

    println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
    println(" ")
    println(" My check = ", mPolicyFunction[1000,3])
    println(" My check = ", mValueFunction[1000,3])

end
2 Likes

That makes sense, I didn’t know that. But it nevertheless useful to repost, since this was pre-1.0…

maybe threading the code can help a little?

Yeah, but the point is the relative speed to C++ or Fortran (which are BTW still quite a bit faster): You could thread them as well.

By patching around julia’s log being slow, and only calculating it when it might be needed, I’ve gotten a 30% speedup. Here’s the inner loop. (I use log(2)*Base.Math.exponent(consumption) as a cheap overestimate for log(x) (floating point tricks). Here’s the new inner loop.

for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital
    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
    cheap = β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

     if (1-β)*log(2)*Base.Math.exponent(consumption) +cheap >valueHighSoFar
         valueProvisional = (1-β)*log(consumption)+cheap
         if (valueProvisional>valueHighSoFar)
              valueHighSoFar = valueProvisional
              capitalChoice = vGridCapital[nCapitalNextPeriod]
              gridCapitalNextPeriod = nCapitalNextPeriod
          else
              break # We break when we have achieved the max
         end
     else
         break
     end
 end
3 Likes