A Comparison of Programming Languages in Economics

We can optimize it and make it crazily fast but the point is that a general economist would likely write the code just like the original version and would not know how (or don’t want to spend time on) optimizing it.

In this case, however, I wonder how often economists/quants do maximum absolute difference of two arrays. Perhaps this kind of things could be collected into a package?

A further 3x performance is seen with customized looping function.

julia> v = rand(1000000); w = rand(1000000);

julia> foo(v, w) = reduce((x_max, x) -> max(x_max, abs(x[1] - x[2])), 0.0, zip(v, w))
foo (generic function with 1 method)

julia> bar(v, w) = maximum(abs.(v - w))
bar (generic function with 1 method)

julia> function maxabsdiff(v, w) 
         m = maximum(abs(v[1]- w[1]))
         for i in 2:length(v)
           z = abs(v[i] - w[i])
           z > m && (m = z)
         end
         m
       end
maxabsdiff (generic function with 1 method)

julia> @benchmark foo($v, $w)
BenchmarkTools.Trial: 
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     3.485 ms (0.00% GC)
  median time:      3.624 ms (0.00% GC)
  mean time:        3.748 ms (0.00% GC)
  maximum time:     5.778 ms (0.00% GC)
  --------------
  samples:          1330
  evals/sample:     1

julia> @benchmark bar($v, $w)
BenchmarkTools.Trial: 
  memory estimate:  15.26 MiB
  allocs estimate:  4
  --------------
  minimum time:     7.541 ms (0.00% GC)
  median time:      9.982 ms (19.24% GC)
  mean time:        9.942 ms (18.26% GC)
  maximum time:     14.870 ms (36.27% GC)
  --------------
  samples:          503
  evals/sample:     1

julia> @benchmark maxabsdiff($v, $w)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.187 ms (0.00% GC)
  median time:      1.297 ms (0.00% GC)
  mean time:        1.355 ms (0.00% GC)
  maximum time:     3.735 ms (0.00% GC)
  --------------
  samples:          3657
  evals/sample:     1

maximum accepts a function as first argument for these cases. For example, to compute the maximum absolute value of a vector v without extra allocations one can do:

maximum(abs, v)

Or with generators:

maximum(abs(x) for x in v)

To get the absolute value of the difference between two arrays we need a slightly more complicated anonymous function:

maximum(x -> abs(x[1]-x[2]), zip(v1, v2))

and maybe the generator syntax is the most elegant here:

maximum(abs(a-b) for (a, b) in zip(v1, v2))

Even for a general user, I think it’s important to get familiar with generators and anonymous functions as they can be very handy in Julia.

EDIT: benchmarking quickly it’d seem that the generator method is slower than the anonymous function method which in turn is about 3x slower than the explicit for loop, really not sure why that would be the case.

3 Likes

These really should be as fast as the loop version so there’s some missed optimization opportunity here. Might be worth opening an issue about it.

9 Likes

Interesting. I may try the next time I’m on an Intel computer. (Although icc/ifort is still pretty fast on AMD.)
Mostly though, I’d want Julia to beat Numba.

It surprises me how much faster Julia is than code compiled with Clang. It’s also much faster than code compiled with Flang, but that’s a much smaller project so I figure it’s probably missing out on a lot of optimizations.
When it comes to “which LLVM front end produces the fastest idiomatic numerical code: Julia, C++, or Fortran?”, I think the answer is Julia.

As a note on Intel’s compilers, some (like Stefan) criticize them heavily for using -ffast-math and breaking explicit parenthesis by default so that things like Kahan summation do not work.
Additionally, CmdStan:

By default, Intel’s compiler trades off accuracy for speed. This, unfortunately, isn’t ideal behavior for the inference algorithms and may cause divergent transitions. Add these flags to make/local to force more accurate floating point computations:

  • CXXFLAGS += -fp-model precise -fp-model source

I remember trying a few benchmarks with these flags on, and it did slow things down.

Of course, the point is often just to get things done quickly and move on. Like @mcreel said, the original Julia version probably looks like what most quantitative economists would write. So I do think there’s something to be said for a compiler capable of manipulating the expressions to optimize performance without sacrificing accuracy (those same people are more likely to get caught off guard and with debugging if their results become wrong).

@mcreel I only got an 8% improvement from all my changes. Perhaps some of them slowed it down despite reducing allocations? I didn’t benchmark them individually.

1 Like

I am getting 1.7 seconds with your code, making it much slower than the rest.

which code are you referring to?

Numba. I didn’t run it IPython, but from the interpreter:

start = time.time()
main_func2()
end = time.time()
print(end - start)
#1.7

This was the same computer as my other benchmarks, where the original Julia ran in 1.25s.
This is from a fresh install of Anaconda for Python 3.6.

Can you try the scripts? My results are still consistent.

$ julia RBC.jl
2018-02-11T15:27:45.939
2018-02-11T15:28:02.55
1.6610999999999998
$ python RBC.py
2018-02-11 15:28:12.076968
2018-02-11 15:28:26.260950
1.4183233837014995
$ python -V
Python 3.6.3 :: Anaconda, Inc.

Okay. I ran into computer trouble yesterday morning, wiped everything, and reinstalled everything.
And…Julia runs a lot slower now. Hmm.
I am not sure why that is.
I let Julia build OpenBLAS with the version I tested with, and also compiled Julia using gcc-trunk. LLVM was dynamically linked.
This time I couldn’t get Julia to build OpenBLAS, so I did it myself and linked it. I compiled Julia with the latest release gcc, and linked LLVM statically. I also just rebuilt Julia and linked dynamically: no difference.
Hmm.

Notebooks:
https://gist.github.com/chriselrod/f30fdfadccd8413000f0703a7cc6f8a5

Anyway, running your scripts:

$ ../../Anaconda/cond/bin/python RBC.py
2018-02-11 18:11:39.366524
2018-02-11 18:11:56.001698
1.6634931039996446
$ julia RBC.jl
2018-02-11T18:13:05.904
2018-02-11T18:13:19.125
1.3220999999999998
$ julia -O3 RBC.jl
2018-02-11T18:13:44.321
2018-02-11T18:13:57.633
1.3312

If Julia slowed down, curious about C++ and Fortran:

$ g++-7 -march=native -O3 RBC.cpp -o RBC_cpp && ./RBC_cpp
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.37482


$ gfortran-7 -march=native -O3 RBC.f90 -o RBC_f && ./RBC_f
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200119E-002
 Iteration:          20 Sup Diff:   1.8703460152962870E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003
 Iteration:          50 Sup Diff:   3.9842909760740008E-003
 Iteration:          60 Sup Diff:   2.3813103290404314E-003
 Iteration:          70 Sup Diff:   1.4236575018528042E-003
 Iteration:          80 Sup Diff:   8.5133892789668320E-004
 Iteration:          90 Sup Diff:   5.0920456767089561E-004
 Iteration:         100 Sup Diff:   3.0462281856569184E-004
 Iteration:         110 Sup Diff:   1.8226456357595122E-004
 Iteration:         120 Sup Diff:   1.0906931033882739E-004
 Iteration:         130 Sup Diff:   6.5276304536676655E-005
 Iteration:         140 Sup Diff:   3.9070994080181443E-005
 Iteration:         150 Sup Diff:   2.3388019260162096E-005
 Iteration:         160 Sup Diff:   1.4008591582403973E-005
 Iteration:         170 Sup Diff:   8.3912834882848841E-006
 Iteration:         180 Sup Diff:   5.0264531855637173E-006
 Iteration:         190 Sup Diff:   3.0108863811051378E-006
 Iteration:         200 Sup Diff:   1.8035437576724433E-006
 Iteration:         210 Sup Diff:   1.0803355822153193E-006
 Iteration:         220 Sup Diff:   6.4712835068370111E-007
 Iteration:         230 Sup Diff:   3.8763410215025829E-007
 Iteration:         240 Sup Diff:   2.3219527300888387E-007
 Iteration:         250 Sup Diff:   1.3908639495685549E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.38328505

Ofast:

$ g++-7 -march=native -Ofast RBC.cpp -o RBC_ofast_cpp && ./RBC_ofast_cpp
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.30403


$ gfortran-7 -march=native -Ofast RBC.f90 -o RBC_ofast_f && ./RBC_ofast_f
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200119E-002
 Iteration:          20 Sup Diff:   1.8703460152962870E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003
 Iteration:          50 Sup Diff:   3.9842909760740008E-003
 Iteration:          60 Sup Diff:   2.3813103290404314E-003
 Iteration:          70 Sup Diff:   1.4236575018528042E-003
 Iteration:          80 Sup Diff:   8.5133892789668320E-004
 Iteration:          90 Sup Diff:   5.0920456767089561E-004
 Iteration:         100 Sup Diff:   3.0462281856569184E-004
 Iteration:         110 Sup Diff:   1.8226456357595122E-004
 Iteration:         120 Sup Diff:   1.0906931033882739E-004
 Iteration:         130 Sup Diff:   6.5276304536676655E-005
 Iteration:         140 Sup Diff:   3.9070994080181443E-005
 Iteration:         150 Sup Diff:   2.3388019260162096E-005
 Iteration:         160 Sup Diff:   1.4008591582403973E-005
 Iteration:         170 Sup Diff:   8.3912834882848841E-006
 Iteration:         180 Sup Diff:   5.0264531855637173E-006
 Iteration:         190 Sup Diff:   3.0108863811051378E-006
 Iteration:         200 Sup Diff:   1.8035437576724433E-006
 Iteration:         210 Sup Diff:   1.0803355822153193E-006
 Iteration:         220 Sup Diff:   6.4712835068370111E-007
 Iteration:         230 Sup Diff:   3.8763410215025829E-007
 Iteration:         240 Sup Diff:   2.3219527300888387E-007
 Iteration:         250 Sup Diff:   1.3908639495685549E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.39106107

Ofast, funroll loops:

$ g++-7 -march=native -Ofast -funroll-loops RBC.cpp -o RBC_ofastfunroll_cpp && ./RBC_ofastfunroll_cpp
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
 Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
 
My check = 0.146549
 
Elapsed time is   = 1.27788



$ gfortran-7 -march=native -Ofast -funroll-loops RBC.f90 -o RBC_ofastfunroll_f && ./RBC_ofastfunroll_f
 Steady State values
 Output:   0.56273142426227074      Capital:   0.17819828742434793      Consumption:   0.38453313683792278     
 Iteration:           1 Sup Diff:   5.2741607134075871E-002
 Iteration:          10 Sup Diff:   3.1346953831200119E-002
 Iteration:          20 Sup Diff:   1.8703460152962870E-002
 Iteration:          30 Sup Diff:   1.1165510606509832E-002
 Iteration:          40 Sup Diff:   6.6685398355890158E-003
 Iteration:          50 Sup Diff:   3.9842909760740008E-003
 Iteration:          60 Sup Diff:   2.3813103290404314E-003
 Iteration:          70 Sup Diff:   1.4236575018528042E-003
 Iteration:          80 Sup Diff:   8.5133892789668320E-004
 Iteration:          90 Sup Diff:   5.0920456767089561E-004
 Iteration:         100 Sup Diff:   3.0462281856569184E-004
 Iteration:         110 Sup Diff:   1.8226456357595122E-004
 Iteration:         120 Sup Diff:   1.0906931033882739E-004
 Iteration:         130 Sup Diff:   6.5276304536676655E-005
 Iteration:         140 Sup Diff:   3.9070994080181443E-005
 Iteration:         150 Sup Diff:   2.3388019260162096E-005
 Iteration:         160 Sup Diff:   1.4008591582403973E-005
 Iteration:         170 Sup Diff:   8.3912834882848841E-006
 Iteration:         180 Sup Diff:   5.0264531855637173E-006
 Iteration:         190 Sup Diff:   3.0108863811051378E-006
 Iteration:         200 Sup Diff:   1.8035437576724433E-006
 Iteration:         210 Sup Diff:   1.0803355822153193E-006
 Iteration:         220 Sup Diff:   6.4712835068370111E-007
 Iteration:         230 Sup Diff:   3.8763410215025829E-007
 Iteration:         240 Sup Diff:   2.3219527300888387E-007
 Iteration:         250 Sup Diff:   1.3908639495685549E-007
 Iteration:         257 Sup Diff:   9.7159772005639411E-008
  
 My check:  0.14654914390886931     
  
 Elapsed time is    1.34112906 

So, gcc seems as fast as before, but my Julia slowed down by a tenth of a second. =/
Still, well ahead of my Numba.

EDIT:
-fprofile-generate, running the code, then recompiling with -fprofile-use → C++ < 1.2 seconds on my computer. Almost as fast as that old Julia time.

Are you using a rebuilt system image? Are you two on the same version of Julia?

Pretty sure no one would write that one though because it has a clear temporary and is why maximum allows a function as the first argument, but this is what I see in RBC.jl.

Dear all

I am Jesus Fernandez-Villaverde, one of the authors of the paper discussed here. The published paper is here:

https://www.sciencedirect.com/science/article/pii/S0165188915000883

It has some changes with respect to the working paper that some of you are discussing. Nothing of importance though.

Let me give you some context on our exercise (some of which has already been pointed out by other people). Our goal was to make a “fair” comparison among programming languages where “fair” needs to be understood as what your “average economist” would code. I happily admit that this criterium allows for a degree of subjectivity, but between my coauthor and me we have been in the community of research economists for nearly 40 years, so our assessment has a reasonable basis.

The idea was not, then, to compare the “best Julia” code against the “best C++” code, but to compare a “reasonably well-written Julia” code that is close to the math of the problem (coding time is usually more binding than running time in macroeconomics, and being close to the math formulation of the problem saves plenty of time) to a “reasonably well-written C++” code (or the other languages).

I would say that Julia came really well out of the comparison. It run nearly as fast as C++ and much faster than Matlab (the most common language in economics). Making the Julia code faster with aggressive optimization would have been, paradoxically, worse for Julia. The average economist would have seen a code that looked strange to her and, between 1.6 seconds and 1.1 seconds, she would not have care much. The main message of the paper was:

“Look: you can code something that looks nearly Matlab (so your transition costs are going to be low) and yet is way faster than Matlab, R, and Python and it does not force you to do all the dances of Numba or Cython.”

We believe that, in that dimension, the paper was really useful and made much to promote Julia among economists.

In my own course on High-Performance Computing for Economists, I show a RBC_fast.jl code where I squeeze all optimizations (including changing the log function) and I get the speed of C++. But the students are often quite puzzled by that code and prefer the original one: they can follow and understand it.

I just re-run a couple of weeks ago the whole exercise. Julia 0.6.2 still does very well. Matlab has improved a bit (they have work hard on speed over the last few releases). R has improved a lot.

Thanks again for all your comments

26 Likes

I think that would be how a regular person writes code and the code looks perfectly logical… The mental model is to take all the differences and then find the max out of the results. We can say that this mental model isn’t efficient but that’s how one generally thinks about solving the problem (not necessarily making it fast). It would be really really cool if compiler technology can do that kind of optimization automatically. (I’m greedy :wink: )

I use the standard distribution for MacOS. Just tried -O3 but I don’t see an noticeable difference.

$ julia RBC.jl
2018-02-11T18:24:16.799
2018-02-11T18:24:32.894
1.6095

$ julia -O3 RBC.jl
2018-02-11T18:24:42.516
2018-02-11T18:24:59.266
1.675

$ echo "versioninfo()" | julia
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
2 Likes

No, not buying that because you didn’t even choose the simplest or more concise code. The simplest code is “take the maximum of the absolute values” which is just maximum(abs,x). Instead you wrote the code “create a new array which is the absolute values and then take the maximum of those”, which obviously is less efficient and more computational steps, but also isn’t any simpler to write than just calling maximum because you’re adding unnecessary broadcasts. I’m pretty sure most people would just write the simpler code and be happy it’s faster in this case.

(The actual optimization try is using @fastmath abs or Base.FastMath.abs_fast which I’ve found can be faster in a lot of cases, but that’s the optimization)

Then that’s probably the issue. The generic binaries aren’t compiled to be architecture-specific. You should rebuild the system image for your hardware:

(Gotcha 7).

That’s a general statement for a single array. In this case, it needs to take the maximum of the absolute value of the difference between two arrays. Take a look at @piever’s answer above. The code isn’t just as intuitive as the slow version (relative speaking depends on audience.)

I’m sure an experienced Julia programmer can do what you have in mind but that’s the not the class of people we’re referring to here. @jesusfv explained it very well above. Honestly, I have 30 years of programming experience and because I’m new to Julia I would have written the same slow-version code on my first attempt… and that’s not going to change until I put my hands on optimizing the code.

I understand that building my own image would speed it up but just to be fair I am using a standard Anaconda distribution as well. I was thinking about this problem for a while. Perhaps the differences are:

  • Julia has access to Yeppp but that’s not the default (same for libm for that matter). Both exp & log functions are so commonly used and they really need to be the optimal version by default.

  • Anaconda ships with Intel MKL library. I suppose MKL performs better on Intel (which is what I have in my Mac.)

2 Likes

You’re really not giving people credit to take the 5 minutes to learn what a generator is. I’ll just accept that you put the bar of “economist” lower than “first year undergrad’s homework assignment” and worry about that on my own time…

Anyways.

That has nothing to do with rebuilding the system image. Just rebuild the system image for the Anaconda installed version. It’s like one line: just copy/paste that code.

Julia is re-writing Libm into pure-Julia so it can be used by default. It’s a longer process but is better in the long run and already has tangible results. I agree that for now it’s less than desirable though.

While I respect the effort that people have put into benchmarking and micro-optimizing this code, I don’t think that this is the right way to show off the comparative advantages of Julia.

Please don’t take this the wrong way, but I think that there is an certain obsession about 2-3x speedups and micro-optimizations in the community. I am convinced that investing the same amount of time into programming a better algorithm or method provides orders of magnitude higher speedups. Focusing on optimizations first that make the code look more complicated and harder to refactor will actually make these more difficult, and thus are counterproductive.

Benchmark games are very tempting because they are quantifiable and concrete. But at the same time they are misleading.

5 Likes

That’s normally the case - but I think that for these economics benchmarks, the idea was to use the same algorithms for all languages, just like the Julia micro benchmarks, right?

1 Like

Here’s a version of the Julia code which:

  • adds @inbounds statements
  • puts the transpose outside the loop
  • uses maximum(abs,()) instead of maximum(abs.())

All of these changes are things that I think a quantitative economist could reasonably be expected to take into account.

Using BenchmarkTools, the original code at https://github.com/jesusfv/Comparison-Programming-Languages-Economics/blob/master/RBC_Julia.jl
gives a median time of 1.70s, while the modified code gives a median time of 1.46s, which is about 15% improvement. This is on julia 0.6.x, compiled from source, on Debian.

So, there is a little scope for improvement in performance, using simple optimizations. Many economists can program in C or Fortran - the appeal of Julia is getting similar (within 2-3X, for me) performance, with faster programming time and with most energy dedicated to thinking about the problem, rather than the programming. According to the paper, and in my own experience, Julia is excellent for achieving these goals. I also expect that even a “lazy” Julia programmer will see performance improvements in the future, from the language itself getting better at doing the more arcane optimizations.

## Basic RBC model with full depreciation
#
# Jesus Fernandez-Villaverde
# Haverford, July 29, 2013
#
# M. Creel minor tweaks, Feb. 2018

function main()

    ##  1. Calibration

    α = 1.0/3.0 # Elasticity of output w.r.t. capital
    β = 0.95    # Discount factor

    # Productivity values
    vProductivity = [0.9792 0.9896 1.0000 1.0106 1.0212]

    # Transition matrix
    mTransition   = [0.9727 0.0273 0.0000 0.0000 0.0000;
                     0.0041 0.9806 0.0153 0.0000 0.0000;
                     0.0000 0.0082 0.9837 0.0082 0.0000;
                     0.0000 0.0000 0.0153 0.9806 0.0041;
                     0.0000 0.0000 0.0000 0.0273 0.9727]
    mTransition = mTransition' # CHANGE: do this here, not inside the loop
    # 2. Steady State

    capitalSteadyState = (α*β)^(1.0/(1.0-α))
    outputSteadyState = capitalSteadyState^α
    consumptionSteadyState = outputSteadyState-capitalSteadyState

    println("Output = ",outputSteadyState," Capital = ",capitalSteadyState," Consumption = ",consumptionSteadyState)

    # We generate the grid of capital
    vGridCapital = collect(0.5*capitalSteadyState:0.00001:1.5*capitalSteadyState)
    nGridCapital = length(vGridCapital)
    nGridProductivity = length(vProductivity)

    # 3. Required matrices and vectors

    mOutput           = zeros(nGridCapital,nGridProductivity)
    mValueFunction    = zeros(nGridCapital,nGridProductivity)
    mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
    mPolicyFunction   = zeros(nGridCapital,nGridProductivity)
    expectedValueFunction = zeros(nGridCapital,nGridProductivity)

    # 4. We pre-build output for each point in the grid

    mOutput = (vGridCapital.^α)*vProductivity

    # 5. Main iteration

    maxDifference = 10.0
    tolerance = 0.0000001
    iteration = 0

    @inbounds while(maxDifference > tolerance)
        expectedValueFunction = mValueFunction*mTransition

        @inbounds for nProductivity in 1:nGridProductivity

            # We start from previous choice (monotonicity of policy function)
            gridCapitalNextPeriod = 1
            @inbounds for nCapital in 1:nGridCapital

                valueHighSoFar = -1000.0
                capitalChoice  = vGridCapital[1]

                @inbounds for nCapitalNextPeriod in gridCapitalNextPeriod:nGridCapital

                    consumption = mOutput[nCapital,nProductivity]-vGridCapital[nCapitalNextPeriod]
                    valueProvisional = (1-β)*log(consumption)+β*expectedValueFunction[nCapitalNextPeriod,nProductivity]

                    if (valueProvisional>valueHighSoFar)
                	       valueHighSoFar = valueProvisional
                	       capitalChoice = vGridCapital[nCapitalNextPeriod]
                	       gridCapitalNextPeriod = nCapitalNextPeriod
                    else
                	       break # We break when we have achieved the max
                    end

                end

                mValueFunctionNew[nCapital,nProductivity] = valueHighSoFar
                mPolicyFunction[nCapital,nProductivity] = capitalChoice

            end

        end
        
        maxDifference     = maximum(abs,(mValueFunctionNew-mValueFunction))
        mValueFunction, mValueFunctionNew = mValueFunctionNew, mValueFunction
        
        iteration += 1
        if mod(iteration,10)==0 || iteration == 1
            println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
        end

    end

    println(" Iteration = ", iteration, " Sup Diff = ", maxDifference)
    println(" ")
    println(" My check = ", mPolicyFunction[1000,3])
    println(" My check = ", mValueFunction[1000,3])

end


I was not talking about the paper (about which you are of course right), but the discussion here.

Imagine the impression this discussion gives to someone who is interested in Julia, but is not a programmer per se. They might conclude that to make your code to run fast, you have to know about rebuilding the system image for your hardware, esoteric-looking constructs, preallocation, @inbounds, replace the built-in matrix multiplication with a BLAS call, etc. After going through this, benchmarking, and experimenting with various combinations, you get a few times 10% increases.

All of these tricks are useful when they are really needed. But the point of Julia is that you can get pretty good performance and keep your code elegant with very little effort.

5 Likes

I agree, I’ve opened an issue here.

2 Likes