As I noted in my last post, the original Julia code (on the latest Julia master) ran in 1.27 s. Cutting down on allocations resulted in 1.17 s.

Best times out of 5 runs:

- 1.173 Julia ā reduced allocation
- 1.252 Julia ā alloc-heavy
- 1.284 g++7.2 Ofast native
- 1.285 g++11-7.2 Ofast native
- 1.296 g++8.0 Ofast native
- 1.319 g++11-8.0 Ofast native
- 1.367 gfotran - 8.0 Ofast native
- 1.381 g++7.2
- 1.384 gfortran-7.2 Ofast native
- 1.396 clang++ native
- 1.398 gfortran - 8.0
- 1.414 gfortran - 7.2
- 1.402 g++ 8.0
- 1.430 g++11- 8.0
- 1.431 clang++11 native

I decided to compare gcc version 7.2 (although 7.3 is the latest release) and the unreleased trunk (8.0).

clang++ is 5.0.1.

If anyone wants to improve any of the code, I can add that to the comparison.

`-march=native`

is an obvious flag to add, that in a few tests seemed to take off a couple tenths of a second. Not nearly enough to even catch up with the original Julia code. Let alone the version with reduced allocations.

Adding blas calls did not really help. Adding lines like this:

```
call dgemm('N', 'T', nGridCapital, nGridProductivity, nGridProductivity, alpha, &
mValueFunction, nGridCapital, mTransition, nGridProductivity, beta, expectedValueFunction, nGridCapital)
```

this worsened external timing with `time`

, and screwed up Fortranās cpu time function.

I wont bother changing this in C++.

```
$ ./testc
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
My check = 0.146549
Elapsed time is = 1.41837
```

Although, gcc-7.2 appears to have done better.

```
$ ./testc-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
My check = 0.146549
Elapsed time is = 1.38093
```

C+Ā±11-8.0:

```
$ ./testc2
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035
Iteration = 30, Sup Diff = 0.0111655
Iteration = 40, Sup Diff = 0.00666854
Iteration = 50, Sup Diff = 0.00398429
Iteration = 60, Sup Diff = 0.00238131
Iteration = 70, Sup Diff = 0.00142366
Iteration = 80, Sup Diff = 0.00085134
Iteration = 90, Sup Diff = 0.000509205
Iteration = 100, Sup Diff = 0.000304623
Iteration = 110, Sup Diff = 0.000182265
Iteration = 120, Sup Diff = 0.00010907
Iteration = 130, Sup Diff = 6.52764e-05
Iteration = 140, Sup Diff = 3.90711e-05
Iteration = 150, Sup Diff = 2.33881e-05
Iteration = 160, Sup Diff = 1.40086e-05
Iteration = 170, Sup Diff = 8.39132e-06
Iteration = 180, Sup Diff = 5.02647e-06
Iteration = 190, Sup Diff = 3.0109e-06
Iteration = 200, Sup Diff = 1.80355e-06
Iteration = 210, Sup Diff = 1.08034e-06
Iteration = 220, Sup Diff = 6.47132e-07
Iteration = 230, Sup Diff = 3.87636e-07
Iteration = 240, Sup Diff = 2.32197e-07
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
My check = 0.146549
Elapsed time is = 1.43721 seconds.
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 10, Sup Diff = 0.0313469...
```

C++11-7.2

```
$ ./testc2-72
Output = 0.562731, Capital = 0.178198, Consumption = 0.384533
Iteration = 1, Sup Diff = 0.0527416
Iteration = 10, Sup Diff = 0.0313469
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 250, Sup Diff = 1.39087e-07
Iteration = 257, Sup Diff = 9.71604e-08
My check = 0.146549
Elapsed time is = 1.37332 seconds.
Iteration = 10, Sup Diff = 0.0313469...
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 20, Sup Diff = 0.0187035...
Iteration = 30, Sup Diff = 0.0111655...
```

Same story for the Fortran code; 8.0 below:

```
$ ./testf1
Steady State values
Output: 0.56273142426227074 Capital: 0.17819828742434793 Consumption: 0.38453313683792278
Iteration: 1 Sup Diff: 5.2741607134075871E-002
Iteration: 10 Sup Diff: 3.1346953831200064E-002
Iteration: 20 Sup Diff: 1.8703460152962759E-002
Iteration: 30 Sup Diff: 1.1165510606509832E-002
Iteration: 40 Sup Diff: 6.6685398355890158E-003
Iteration: 50 Sup Diff: 3.9842909760740008E-003
Iteration: 60 Sup Diff: 2.3813103290404314E-003
Iteration: 70 Sup Diff: 1.4236575018528042E-003
Iteration: 80 Sup Diff: 8.5133892789679422E-004
Iteration: 90 Sup Diff: 5.0920456767089561E-004
Iteration: 100 Sup Diff: 3.0462281856558082E-004
Iteration: 110 Sup Diff: 1.8226456357595122E-004
Iteration: 120 Sup Diff: 1.0906931033871636E-004
Iteration: 130 Sup Diff: 6.5276304536787677E-005
Iteration: 140 Sup Diff: 3.9070994080292465E-005
Iteration: 150 Sup Diff: 2.3388019260051074E-005
Iteration: 160 Sup Diff: 1.4008591582403973E-005
Iteration: 170 Sup Diff: 8.3912834882848841E-006
Iteration: 180 Sup Diff: 5.0264531857857619E-006
Iteration: 190 Sup Diff: 3.0108863812161601E-006
Iteration: 200 Sup Diff: 1.8035437577834657E-006
Iteration: 210 Sup Diff: 1.0803355822153193E-006
Iteration: 220 Sup Diff: 6.4712835090574572E-007
Iteration: 230 Sup Diff: 3.8763410237230289E-007
Iteration: 240 Sup Diff: 2.3219527311990618E-007
Iteration: 250 Sup Diff: 1.3908639506787779E-007
Iteration: 257 Sup Diff: 9.7159772005639411E-008
My check: 0.14654914390886931
Elapsed time is 1.40512800
```

gfortran-7.2

```
$ ./testf1-72
Steady State values
Output: 0.56273142426227074 Capital: 0.17819828742434793 Consumption: 0.38453313683792278
Iteration: 1 Sup Diff: 5.2741607134075871E-002
Iteration: 10 Sup Diff: 3.1346953831200064E-002
Iteration: 20 Sup Diff: 1.8703460152962759E-002
Iteration: 30 Sup Diff: 1.1165510606509832E-002
Iteration: 40 Sup Diff: 6.6685398355890158E-003...
Iteration: 230 Sup Diff: 3.8763410237230289E-007
Iteration: 240 Sup Diff: 2.3219527311990618E-007
Iteration: 250 Sup Diff: 1.3908639506787779E-007
Iteration: 257 Sup Diff: 9.7159772005639411E-008
My check: 0.14654914390886931
Elapsed time is 1.41379201
```