A Comparison of Programming Languages in Economics


#1

I saw this paper recently:

The codes are here:

I run the Fortran code and Julia code on my machine, my results are consistent with the findings of this paper. However, I am not sure why the Python (Numba and Cython) codes are also faster than Julia code.


#2

Page not found for the pdf. Here’s some version of the paper: https://www.nber.org/papers/w20263.pdf, but it’s using Julia 0.2.1.


#3

Thanks. Link updated.


#4

Did you use the -O3 and --check-bounds=no flags, as well as rebuild the system image for your architecture?

Actually, almost half the time is being spent in log, which should get a lot better soon by switching from the openlibm implementation to a pure Julia implementation: https://github.com/JuliaLang/julia/pull/24750. You can actually already use the new algorithm in 0.6.2 by replacing log with Base.Math.JuliaLibm.log.


#5

Pardon my ignorance, but will that mean log (and a variety of other functions) will now readily be inlined -> less function call overhead, and more simd vectorization?

I find it cool that Julia gets faster when people get around to rewriting C/C++/Fortran code in Julia.
The exact opposite of the situation in R/Python/Matlab & co.


#6

I’d read this years ago (although it was already rather out of date with regards to Julia when I read it 3 years ago).
It would be very interesting to see the results with Julia v0.6.2 and master, as well as with the latest Python, Matlab, etc.


#7

If anyone can take a look at optimizing the code a bit for v0.7, I bet he would love to have the code to update the results.


#8

Sure. I am on it.

First impression:
It is allocating needlessly. A lot.

EDIT:
Currently rebuilding Julia-dev…
Then I’ll verify both work, and compare new and old versions.


#9

I don’t think it’s about inlining. The time is about the same if you define

@noinline noinlinelog(x) = Base.Math.JuliaLibm.log(x)

and use that in the algorithm.

In terms of vectorization, you could possibly do better by using Yeppp.jl, but the if statement in the inner loop makes that nontrivial.

As for making the code faster: the question as always becomes whether the benchmark is about comparing the most optimized versions of an algorithm, or about comparing ‘normal’ usage of the languages (for some definition of normal). That said, e.g. expectedValueFunction is preallocated in the C++ version, so that’s not really fair.


#10

Sounds great. Take your time, and when you think you have solid performance (still using idiomatic Julia) I can make an introduction to Jesus


#11

My performance improvement through simply cutting down on obvious allocations was not great:

julia> @benchmark mainOld(false)
@benchmark mainNew(false)
BenchmarkTools.Trial: 
  memory estimate:  528.54 MiB
  allocs estimate:  1568
  --------------
  minimum time:     1.252 s (0.90% GC)
  median time:      1.270 s (0.89% GC)
  mean time:        1.281 s (1.63% GC)
  maximum time:     1.331 s (3.74% GC)
  --------------
  samples:          4
  evals/sample:     1

julia> @benchmark mainNew(false)
BenchmarkTools.Trial: 
  memory estimate:  3.54 MiB
  allocs estimate:  23
  --------------
  minimum time:     1.173 s (0.00% GC)
  median time:      1.175 s (0.00% GC)
  mean time:        1.176 s (0.00% GC)
  maximum time:     1.185 s (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1

That is, only about 0.1 seconds or 8% improvement.
For reference:


using LinearAlgebra

function mainNew(verbose = true)

    ##  1. Calibration

    α = 1/3     # Elasticity of output w.r.t. capital
    β = 0.95    # Discount factor

    # Productivity values
    vProductivity = [0.9792 0.9896 1.0000 1.0106 1.0212]

    # Transition matrix
    mTransition   = [0.9727 0.0273 0.0000 0.0000 0.0000;
                     0.0041 0.9806 0.0153 0.0000 0.0000;
                     0.0000 0.0082 0.9837 0.0082 0.0000;
                     0.0000 0.0000 0.0153 0.9806 0.0041;
                     0.0000 0.0000 0.0000 0.0273 0.9727]

    # 2. Steady State

    capitalSteadyState = (α*β)^(1/(1-α))
    outputSteadyState = capitalSteadyState^α
    consumptionSteadyState = outputSteadyState-capitalSteadyState

    verbose && println("Output = ",outputSteadyState," Capital = ",capitalSteadyState," Consumption = ",consumptionSteadyState)

    # We generate the grid of capital
    vGridCapital = collect(0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState)
    #vGridCapital = 0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState

    nGridCapital = length(vGridCapital)
    nGridProductivity = length(vProductivity)

    # 3. Required matrices and vectors

    mOutput           = Matrix{Float64}(uninitialized, nGridCapital, nGridProductivity)
    mValueFunction    = fill(0.0, nGridCapital, nGridProductivity)
    mValueFunctionNew = Matrix{Float64}(uninitialized, nGridCapital,nGridProductivity)
    mPolicyFunction   = Matrix{Float64}(uninitialized, nGridCapital,nGridProductivity)
    expectedValueFunction = Matrix{Float64}(uninitialized, nGridCapital, nGridProductivity)

    # 4. We pre-build output for each point in the grid

    @. mOutput = vGridCapital^ α * vProductivity;

    # 5. Main iteration

    maxDifference = 10.0
    tolerance = 0.0000001
    iteration = 0

    while(maxDifference > tolerance)
        mul!(expectedValueFunction, mValueFunction, mTransition')

        @inbounds for nProductivity in 1:nGridProductivity

            # We start from previous choice (monotonicity of policy function)
            gridCapitalNextPeriod = 1

            for nCapital in 1:nGridCapital

                valueHighSoFar = -1000.0
                capitalChoice  = vGridCapital[1]

                for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital

                    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
                    valueProvisional = (1-β)*log(consumption)+β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

                    if (valueProvisional>valueHighSoFar)
                	       valueHighSoFar = valueProvisional
                	       capitalChoice = vGridCapital[nCapitalNextPeriod]
                	       gridCapitalNextPeriod = nCapitalNextPeriod
                    else
                	       break # We break when we have achieved the max
                    end

                end

                mValueFunctionNew[nCapital,nProductivity] = valueHighSoFar
                mPolicyFunction[nCapital,nProductivity] = capitalChoice

            end

        end
        
        # I can write into expectedValueFunction, because we are about to
        # overwrite it anyway on the start of the next iteration.
        @. expectedValueFunction = abs(mValueFunctionNew-mValueFunction)
        maxDifference     = maximum(expectedValueFunction)
        mValueFunction, mValueFunctionNew = mValueFunctionNew, mValueFunction
        
        iteration = iteration+1
        if verbose && (mod(iteration,10)==0 || iteration == 1)
            println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        end

    end

    if verbose
        println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        println(" ")
        println(" My check = ", mPolicyFunction[1000,3])
        println(" My check = ", mValueFunction[1000,3])
    end
    return iteration, maxDifference, mPolicyFunction[1000,3], mValueFunction[1000,3]

end


function mainOld(verbose = true)

    ##  1. Calibration

    α = 1/3     # Elasticity of output w.r.t. capital
    β = 0.95    # Discount factor

    # Productivity values
    vProductivity = [0.9792 0.9896 1.0000 1.0106 1.0212]

    # Transition matrix
    mTransition   = [0.9727 0.0273 0.0000 0.0000 0.0000;
                     0.0041 0.9806 0.0153 0.0000 0.0000;
                     0.0000 0.0082 0.9837 0.0082 0.0000;
                     0.0000 0.0000 0.0153 0.9806 0.0041;
                     0.0000 0.0000 0.0000 0.0273 0.9727]

    # 2. Steady State

    capitalSteadyState = (α*β)^(1/(1-α))
    outputSteadyState = capitalSteadyState^α
    consumptionSteadyState = outputSteadyState-capitalSteadyState

    verbose && println("Output = ",outputSteadyState," Capital = ",capitalSteadyState," Consumption = ",consumptionSteadyState)

    # We generate the grid of capital
    vGridCapital = collect(0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState)

    nGridCapital = length(vGridCapital)
    nGridProductivity = length(vProductivity)

    # 3. Required matrices and vectors

    mOutput           = zeros(nGridCapital,nGridProductivity)
    mValueFunction    = zeros(nGridCapital,nGridProductivity)
    mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
    mPolicyFunction   = zeros(nGridCapital,nGridProductivity)
    expectedValueFunction = zeros(nGridCapital,nGridProductivity)

    # 4. We pre-build output for each point in the grid

    mOutput = (vGridCapital.^α)*vProductivity;

    # 5. Main iteration

    maxDifference = 10.0
    tolerance = 0.0000001
    iteration = 0

    while(maxDifference > tolerance)
        expectedValueFunction = mValueFunction*mTransition';

        for nProductivity in 1:nGridProductivity

            # We start from previous choice (monotonicity of policy function)
            gridCapitalNextPeriod = 1

            for nCapital in 1:nGridCapital

                valueHighSoFar = -1000.0
                capitalChoice  = vGridCapital[1]

                for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital

                    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
                    valueProvisional = (1-β)*log(consumption)+β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

                    if (valueProvisional>valueHighSoFar)
                	       valueHighSoFar = valueProvisional
                	       capitalChoice = vGridCapital[nCapitalNextPeriod]
                	       gridCapitalNextPeriod = nCapitalNextPeriod
                    else
                	       break # We break when we have achieved the max
                    end

                end

                mValueFunctionNew[nCapital,nProductivity] = valueHighSoFar
                mPolicyFunction[nCapital,nProductivity] = capitalChoice

            end

        end
        
        maxDifference     = maximum(abs.(mValueFunctionNew-mValueFunction))
        mValueFunction, mValueFunctionNew = mValueFunctionNew, mValueFunction
        
        iteration = iteration+1
        if verbose && (mod(iteration,10)==0 || iteration == 1)
            println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        end

    end


    if verbose
        println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        println(" ")
        println(" My check = ", mPolicyFunction[1000,3])
        println(" My check = ", mValueFunction[1000,3])
    end
    return iteration, maxDifference, mPolicyFunction[1000,3], mValueFunction[1000,3]


end

I’ll take a look at the C++ and Fortran versions.
The Fortran version seems to be allocating.
I’ll replace matmul with a BLAS call to at least make some effort toward fairness.
Same with C++. Probably doesn’t make up too much a % of total runtime anyway.

Using Yeppp may be going a little past idiomatic Julia?


#12

As I noted in my last post, the original Julia code (on the latest Julia master) ran in 1.27 s. Cutting down on allocations resulted in 1.17 s.

Best times out of 5 runs:

  1. 1.173 Julia – reduced allocation
  2. 1.252 Julia – alloc-heavy
  3. 1.284 g++7.2 Ofast native
  4. 1.285 g++11-7.2 Ofast native
  5. 1.296 g++8.0 Ofast native
  6. 1.319 g++11-8.0 Ofast native
  7. 1.367 gfotran - 8.0 Ofast native
  8. 1.381 g++7.2
  9. 1.384 gfortran-7.2 Ofast native
  10. 1.396 clang++ native
  11. 1.398 gfortran - 8.0
  12. 1.414 gfortran - 7.2
  13. 1.402 g++ 8.0
  14. 1.430 g++11- 8.0
  15. 1.431 clang++11 native

I decided to compare gcc version 7.2 (although 7.3 is the latest release) and the unreleased trunk (8.0).
clang++ is 5.0.1.

If anyone wants to improve any of the code, I can add that to the comparison.
-march=native is an obvious flag to add, that in a few tests seemed to take off a couple tenths of a second. Not nearly enough to even catch up with the original Julia code. Let alone the version with reduced allocations.

Adding blas calls did not really help. Adding lines like this:

call dgemm('N', 'T', nGridCapital, nGridProductivity, nGridProductivity, alpha, &
        mValueFunction, nGridCapital, mTransition, nGridProductivity, beta, expectedValueFunction, nGridCapital)

this worsened external timing with time, and screwed up Fortran’s cpu time function.
I wont bother changing this in C++.

$ ./testc
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.41837

$ ./testc
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.40445

$ ./testc
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.41243

$ ./testc
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.40178

$ ./testc
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.4145

Although, gcc-7.2 appears to have done better.

$ ./testc-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.38093
 
$ ./testc-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.40683
 
$ ./testc-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.41896
 
$ ./testc-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.41433

$ ./testc-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.4305

C+±11-8.0:

$ ./testc2 
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.43721 seconds.

$ ./testc2
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.42954 seconds.

$ ./testc2
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.45253 seconds.

$ ./testc2
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.45079 seconds.

$ ./testc2
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.46574 seconds.

C++11-7.2

$ ./testc2-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.37332 seconds.

$ ./testc2-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.40465 seconds.

$ ./testc2-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.3946 seconds.

$ ./testc2-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.38952 seconds.

$ ./testc2-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655...
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08

My check = 0.146549

Elapsed time is   = 1.386 seconds.

Same story for the Fortran code; 8.0 below:

$ ./testf1 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003
 Iteration:          50 Sup Diff:   3.9842909760740008E-003
 Iteration:          60 Sup Diff:   2.3813103290404314E-003
 Iteration:          70 Sup Diff:   1.4236575018528042E-003
 Iteration:          80 Sup Diff:   8.5133892789679422E-004
 Iteration:          90 Sup Diff:   5.0920456767089561E-004
 Iteration:         100 Sup Diff:   3.0462281856558082E-004
 Iteration:         110 Sup Diff:   1.8226456357595122E-004
 Iteration:         120 Sup Diff:   1.0906931033871636E-004
 Iteration:         130 Sup Diff:   6.5276304536787677E-005
 Iteration:         140 Sup Diff:   3.9070994080292465E-005
 Iteration:         150 Sup Diff:   2.3388019260051074E-005
 Iteration:         160 Sup Diff:   1.4008591582403973E-005
 Iteration:         170 Sup Diff:   8.3912834882848841E-006
 Iteration:         180 Sup Diff:   5.0264531857857619E-006
 Iteration:         190 Sup Diff:   3.0108863812161601E-006
 Iteration:         200 Sup Diff:   1.8035437577834657E-006
 Iteration:         210 Sup Diff:   1.0803355822153193E-006
 Iteration:         220 Sup Diff:   6.4712835090574572E-007
 Iteration:         230 Sup Diff:   3.8763410237230289E-007
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.40512800

$ ./testf1 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003
 Iteration:          50 Sup Diff:   3.9842909760740008E-003
 Iteration:          60 Sup Diff:   2.3813103290404314E-003
 Iteration:          70 Sup Diff:   1.4236575018528042E-003
 Iteration:          80 Sup Diff:   8.5133892789679422E-004
 Iteration:          90 Sup Diff:   5.0920456767089561E-004
 Iteration:         100 Sup Diff:   3.0462281856558082E-004
 Iteration:         110 Sup Diff:   1.8226456357595122E-004
 Iteration:         120 Sup Diff:   1.0906931033871636E-004
 Iteration:         130 Sup Diff:   6.5276304536787677E-005
 Iteration:         140 Sup Diff:   3.9070994080292465E-005
 Iteration:         150 Sup Diff:   2.3388019260051074E-005
 Iteration:         160 Sup Diff:   1.4008591582403973E-005
 Iteration:         170 Sup Diff:   8.3912834882848841E-006
 Iteration:         180 Sup Diff:   5.0264531857857619E-006
 Iteration:         190 Sup Diff:   3.0108863812161601E-006
 Iteration:         200 Sup Diff:   1.8035437577834657E-006
 Iteration:         210 Sup Diff:   1.0803355822153193E-006
 Iteration:         220 Sup Diff:   6.4712835090574572E-007
 Iteration:         230 Sup Diff:   3.8763410237230289E-007
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.39837098

$ ./testf1 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003
 Iteration:          50 Sup Diff:   3.9842909760740008E-003
 Iteration:          60 Sup Diff:   2.3813103290404314E-003
 Iteration:          70 Sup Diff:   1.4236575018528042E-003
 Iteration:          80 Sup Diff:   8.5133892789679422E-004
 Iteration:          90 Sup Diff:   5.0920456767089561E-004
 Iteration:         100 Sup Diff:   3.0462281856558082E-004
 Iteration:         110 Sup Diff:   1.8226456357595122E-004
 Iteration:         120 Sup Diff:   1.0906931033871636E-004
 Iteration:         130 Sup Diff:   6.5276304536787677E-005
 Iteration:         140 Sup Diff:   3.9070994080292465E-005
 Iteration:         150 Sup Diff:   2.3388019260051074E-005
 Iteration:         160 Sup Diff:   1.4008591582403973E-005
 Iteration:         170 Sup Diff:   8.3912834882848841E-006
 Iteration:         180 Sup Diff:   5.0264531857857619E-006
 Iteration:         190 Sup Diff:   3.0108863812161601E-006
 Iteration:         200 Sup Diff:   1.8035437577834657E-006
 Iteration:         210 Sup Diff:   1.0803355822153193E-006
 Iteration:         220 Sup Diff:   6.4712835090574572E-007
 Iteration:         230 Sup Diff:   3.8763410237230289E-007
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.41704905

$ ./testf1 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002...
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.42640603

$ ./testf1 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002...
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.39755297

$ ./testf1 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002...
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.40822804    

gfortran-7.2

$ ./testf1-72 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003...
 Iteration:         230 Sup Diff:   3.8763410237230289E-007
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.41379201    

$ ./testf1-72 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002...
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.46986699    

$ ./testf1-72 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002...
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.44693005    
$ ./testf1-72 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003...
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.42388999    
$ ./testf1-72 
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200064E-002
 Iteration:          20 Sup Diff:   1.8703460152962759E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003...
 Iteration:         230 Sup Diff:   3.8763410237230289E-007
 Iteration:         240 Sup Diff:   2.3219527311990618E-007
 Iteration:         250 Sup Diff:   1.3908639506787779E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.41948009    

#13

Agree completely. Would be nice to have an additional version which sees how fast an elegant looking version of the code, where it looks like what only Julia can deliver efficiently… Otherwise it runs the risk of looking like “fortan in any language”. This is an important point that was hard to get across in the original paper.


#14

Using even just the original code, Julia was faster than C++ and Fortran.
But I think with the advent of “more dots”, idiomatic code does look more like my version.


#15

I know. How fast would the code be if it was “as pretty as possible”. If it is still relatively fast, then it is a major point to be made about sweet spot of elegance with good enough speed.


#16

Okay, now that you say “pretty” I do think my Matrix{Float64}(uninitialized, is on the ugly side.
If you want to pretty things up a bit, I’ll benchmark them on my computer and add them above.

Or of course, anyone’s free to rerun all the benchmarks.
Odds are that we’ll see all sorts of differences from computer to computer.
But I’m guessing results wont differ too much.

In my (limited) testing, Julia pretty consistently edges out C++ and Fortran.
While being a lot prettier.
And having a REPL to make life easy.

It is impressive how far Julia seems to have come from v0.2!


#17

In the Julia code, you can get about 10% improvement by transposing mTransition outside the loop (I’m using current Julia 0.6.x). Otherwise, the original code looks like what an quantitative economist would write (I am one). For comparisons that would convince economists, too much fancy optimization would miss the point of what they want out of the language.

EDIT: scratch that on the 10%, it seems to be hardly anything, now that I run it again. However, putting @inbounds in the appropriate places does have a good effect.


#18

On my computer, both the original Julia code and your improved version are faster than the C and Fortran code compiled with GCC.

However, a version I previously compiled with the Intel compiler is considerably faster than both (0.9s Intel vs 1.6s Julia). I no longer have the Intel compiler installed, so I can’t figure out why this is. It only seems to be using one processor according to my system monitor, so it’s not automatic multithreading.


#19

Once again I’m focusing on Python Numba vs Julia :slight_smile: I’m surprised by the results, however…

  • Python Numba 1.3s
  • Julia 1.94s
  • Julia Libm 1.53s
  • Julia Libm+Inbound 1.55s (no noticeable difference)

Details are here:


#20

There is this line

maxDifference     = maximum(abs.(mValueFunctionNew-mValueFunction))

This allocates memory just to find the maximum. An alternative would be

reduce((x_max, x) -> max(x_max, abs(x[1] - x[2])), 0.0, zip(mValueFunctionNew, mValueFunction))

and here

valueProvisional = (1-β)*Base.Math.JuliaLibm.log(consumption)+β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

you could also add an inbounds and use muladd(a, b, c*d) instead of a*b+c*d to get an fma instruction.