Specifying ode solver options to speed up compute time

Hello All,

This is a follow up to my question posted here.

I’ve installed Julia in windows and I am running the following code (posted in the above-mentioned link) via the Julia plugin instaled in Pycharm.

using DifferentialEquations, BenchmarkTools

mat1=[
       1    -2     1     0     0     0     0     0     0     0;
       0     1    -2     1     0     0     0     0     0     0;
       0     0     1    -2     1     0     0     0     0     0;
       0     0     0     1    -2     1     0     0     0     0;
       0     0     0     0     1    -2     1     0     0     0;
       0     0     0     0     0     1    -2     1     0     0;
       0     0     0     0     0     0     1    -2     1     0;
       0     0     0     0     0     0     0     1    -2     1;
       ];

mat2 = [
        1    -1     0     0     0     0     0     0     0     0;
        0     1    -1     0     0     0     0     0     0     0;
        0     0     1    -1     0     0     0     0     0     0;
        0     0     0     1    -1     0     0     0     0     0;
        0     0     0     0     1    -1     0     0     0     0;
        0     0     0     0     0     1    -1     0     0     0;
        0     0     0     0     0     0     1    -1     0     0;
        0     0     0     0     0     0     0     1    -1     0;
        ];



x0 = [1.0,0,0,0,0,0,0,0,0,0]
saveat = 0:0.01:5

function fun(dx,x,p,t)
    dx[1,:] .= 0
    dx[2:9,:] .= mat1*x + mat2*x
    dx[10,:] .= 2*(x[end-1] - x[end])
end

prob = ODEProblem(fun,x0,(0.0,5.0))
sys = modelingtoolkitize(prob)
fastprob = ODEProblem(sys,x0,(0.0,5.0),jac=true)

# Explicit RK Methods
@btime sol = solve(fastprob,Tsit5()) # 16.700 ΞΌs (245 allocations: 40.28 KiB)
@btime sol = solve(fastprob,BS3()) # 19.800 ΞΌs (231 allocations: 33.70 KiB)
@btime sol = solve(fastprob,Vern7()) # 18.400 ΞΌs (266 allocations: 49.62 KiB)

# Stabilized-Explicit RK Methods
@btime sol = solve(fastprob,ROCK2()) # 173.300 ΞΌs (831 allocations: 159.59 KiB)
@btime sol = solve(fastprob,ROCK4()) # 237.100 ΞΌs (1958 allocations: 191.64 KiB)

# Implicit and Semi-Implicit Methods
@btime sol = solve(fastprob,Rosenbrock23()) # 83.200 ΞΌs (541 allocations: 53.50 KiB)
@btime sol = solve(fastprob,TRBDF2()) # 72.400 ΞΌs (297 allocations: 31.72 KiB)
@btime sol = solve(fastprob,KenCarp47()) # 110.500 ΞΌs (444 allocations: 33.02 KiB)

sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true)
@btime sol = solve(sparseprob,Rosenbrock23()) # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
@btime sol = solve(sparseprob,TRBDF2()) # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
@btime sol = solve(sparseprob,KenCarp47()) # 346.400 ΞΌs (1757 allocations: 525.05 KiB)

using Setfield, LinearAlgebra
f = fastprob.f
newf = @set f.jac_prototype = Tridiagonal(sparseprob.f.jac_prototype)
newf = @set newf.sparsity = Tridiagonal(sparseprob.f.sparsity)
tridiagprob = ODEProblem(newf,x0,(0.0,5.0))
@btime sol = solve(tridiagprob,Rosenbrock23()) # 188.000 ΞΌs (556 allocations: 66.19 KiB)
@btime sol = solve(tridiagprob,TRBDF2()) # 87.800 ΞΌs (338 allocations: 40.31 KiB)
@btime sol = solve(tridiagprob,KenCarp47()) # 133.400 ΞΌs (482 allocations: 42.16 KiB)

I’m not sure why but I observe time in ms and not in microseconds as posted by @ChrisRackauckas on SE.

  2.847 ms (34957 allocations: 2.63 MiB)
  2.985 ms (31647 allocations: 1.73 MiB)
  3.721 ms (46812 allocations: 5.47 MiB)
  22.349 ms (268252 allocations: 10.67 MiB)
  5.789 ms (80043 allocations: 3.01 MiB)
  78.085 ms (605234 allocations: 20.99 MiB)
  25.611 ms (217583 allocations: 7.47 MiB)
  33.834 ms (288531 allocations: 10.52 MiB)
  74.903 ms (597849 allocations: 20.82 MiB)
  25.356 ms (210161 allocations: 7.29 MiB)
  34.504 ms (281243 allocations: 10.33 MiB)
  11.809 ms (156756 allocations: 4.92 MiB)
  7.047 ms (72883 allocations: 2.52 MiB)
  9.795 ms (103225 allocations: 4.15 MiB)

Could someone help me in understanding why it takes ms and not microseconds when I run the code ?

Thanks so much

The big thing that will slow this down is that you aren’t taking advantage of the structure of your matrices (and they are non-const globals). If you use BandedMatrices for mat1 and mat2, and make them const it should be a massive improvement.

Julia v1.6.1? What happens if you run it in the REPL?

Hi @Oscar_Smith , Thanks for the suggestion. Since the compute time reported in
in microseconds ( the link shared above) was without the use of BandedMatrices , I wasn’t sure why I am not able to observe the same compute time reported by @ChrisRackauckas.

That’s not needed because it’s already further optimizing beyond that using the modelingtoolkitize to fully scalarize it.

Yes , I’m using Julia 1.6.1. I’m sorry, I’m really new to Julia and I couldn’t figure out what REPL means. If you don’t mind, could you please direct me to a tutorial on how to run via REPL?

I think you meant the terminal window of the julia executable. I’ll give it a try. Thank you

in REPL, should I do julia file.jl?

I’m not doing anything special. I’m just doing it the standard way in Juno.

running

1 Like

Can you share versioninfo() @Deepa ?

@ChrisRackauckas Please find the version info below

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

@Elrod are you able to reproduce this one? I wonder if there’s some weird thing FastBroadcast.jl does to some chips.

No.

julia> using DifferentialEquations, BenchmarkHistograms

julia> mat1=[
              1    -2     1     0     0     0     0     0     0     0;
              0     1    -2     1     0     0     0     0     0     0;
              0     0     1    -2     1     0     0     0     0     0;
              0     0     0     1    -2     1     0     0     0     0;
              0     0     0     0     1    -2     1     0     0     0;
              0     0     0     0     0     1    -2     1     0     0;
              0     0     0     0     0     0     1    -2     1     0;
              0     0     0     0     0     0     0     1    -2     1;
              ];

julia> mat2 = [
               1    -1     0     0     0     0     0     0     0     0;
               0     1    -1     0     0     0     0     0     0     0;
               0     0     1    -1     0     0     0     0     0     0;
               0     0     0     1    -1     0     0     0     0     0;
               0     0     0     0     1    -1     0     0     0     0;
               0     0     0     0     0     1    -1     0     0     0;
               0     0     0     0     0     0     1    -1     0     0;
               0     0     0     0     0     0     0     1    -1     0;
               ];

julia> x0 = [1.0,0,0,0,0,0,0,0,0,0];

julia> saveat = 0:0.01:5;

julia> function fun(dx,x,p,t)
           dx[1,:] .= 0
           dx[2:9,:] .= mat1*x + mat2*x
           dx[10,:] .= 2*(x[end-1] - x[end])
       end
fun (generic function with 1 method)

julia> prob = ODEProblem(fun,x0,(0.0,5.0));

julia> sys = modelingtoolkitize(prob);

julia> fastprob = ODEProblem(sys,x0,(0.0,5.0),jac=true);

julia> # Explicit RK Methods
       @benchmark sol = solve($fastprob,Tsit5()) # 16.700 ΞΌs (245 allocations: 40.28 KiB)
samples: 10000; evals/sample: 1; memory estimate: 39.20 KiB; allocs estimate: 244
ns

 (14200.0 - 15900.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 5365
 (15900.0 - 17600.0 ]  β–ˆβ–245
 (17600.0 - 19400.0 ]  β–Œ68
 (19400.0 - 21100.0 ]  β–ˆβ–ˆ344
 (21100.0 - 22800.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹1700
 (22800.0 - 24500.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ1518
 (24500.0 - 26200.0 ]  β–ˆβ–ˆβ–‰513
 (26200.0 - 27900.0 ]  β–ˆβ–181
 (27900.0 - 29600.0 ]  ▍48
 (29600.0 - 31300.0 ]  ▏2
 (31300.0 - 33000.0 ]  ▏2
 (33000.0 - 34700.0 ]  ▏2
 (34700.0 - 36400.0 ]  ▏1
 (36400.0 - 38100.0 ]  ▏1
 (38100.0 - 4.4385e6]  ▏10

                  Counts

min: 14.226 ΞΌs (0.00% GC); mean: 20.654 ΞΌs (10.28% GC); median: 15.399 ΞΌs (0.00% GC); max: 4.438 ms (99.04% GC).

julia> @benchmark sol = solve($fastprob,BS3()) # 19.800 ΞΌs (231 allocations: 33.70 KiB)
samples: 10000; evals/sample: 1; memory estimate: 33.03 KiB; allocs estimate: 230
ns

 (17900.0 - 19600.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 5474
 (19600.0 - 21200.0 ]  β–ˆβ–Š305
 (21200.0 - 22800.0 ]  β–Œ86
 (22800.0 - 24400.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–1661
 (24400.0 - 26100.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž1680
 (26100.0 - 27700.0 ]  β–ˆβ–ˆβ–‰518
 (27700.0 - 29300.0 ]  β–‰144
 (29300.0 - 31000.0 ]  β–Œ88
 (31000.0 - 32600.0 ]  β–Ž30
 (32600.0 - 34200.0 ]  ▏1
 (34200.0 - 35900.0 ]   0
 (35900.0 - 37500.0 ]   0
 (37500.0 - 39100.0 ]   0
 (39100.0 - 40700.0 ]  ▏3
 (40700.0 - 4.8166e6]  ▏10

                  Counts

min: 17.927 ΞΌs (0.00% GC); mean: 23.190 ΞΌs (7.90% GC); median: 18.893 ΞΌs (0.00% GC); max: 4.817 ms (99.08% GC).

julia> @benchmark sol = solve($fastprob,Vern7()) # 18.400 ΞΌs (266 allocations: 49.62 KiB)
samples: 10000; evals/sample: 1; memory estimate: 47.36 KiB; allocs estimate: 265
ns

 (15800.0 - 18200.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ6001
 (18200.0 - 20700.0 ]  β–ˆβ–Š341
 (20700.0 - 23100.0 ]  ▍63
 (23100.0 - 25600.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ1280
 (25600.0 - 28000.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–1824
 (28000.0 - 30500.0 ]  β–ˆβ–‰360
 (30500.0 - 32900.0 ]  β–‹102
 (32900.0 - 35400.0 ]  ▏11
 (35400.0 - 37800.0 ]   0
 (37800.0 - 40300.0 ]   0
 (40300.0 - 42700.0 ]  ▏4
 (42700.0 - 45200.0 ]  ▏3
 (45200.0 - 47600.0 ]   0
 (47600.0 - 50100.0 ]  ▏1
 (50100.0 - 3.9094e6]  ▏10

                  Counts

min: 15.794 ΞΌs (0.00% GC); mean: 22.292 ΞΌs (8.35% GC); median: 17.203 ΞΌs (0.00% GC); max: 3.909 ms (98.93% GC).

julia> # Stabilized-Explicit RK Methods
       @benchmark sol = solve($fastprob,ROCK2()) # 173.300 ΞΌs (831 allocations: 159.59 KiB)
samples: 10000; evals/sample: 1; memory estimate: 158.97 KiB; allocs estimate: 830
ns

 (160000.0 - 390000.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 9980
 (390000.0 - 620000.0]  ▏1
 (620000.0 - 840000.0]   0
 (840000.0 - 1.07e6  ]   0
 (1.07e6   - 1.3e6   ]   0
 (1.3e6    - 1.52e6  ]   0
 (1.52e6   - 1.75e6  ]   0
 (1.75e6   - 1.98e6  ]   0
 (1.98e6   - 2.2e6   ]   0
 (2.2e6    - 2.43e6  ]   0
 (2.43e6   - 2.66e6  ]   0
 (2.66e6   - 2.88e6  ]   0
 (2.88e6   - 3.11e6  ]   0
 (3.11e6   - 3.34e6  ]  ▏9
 (3.34e6   - 3.85e6  ]  ▏10

                  Counts

min: 161.643 ΞΌs (0.00% GC); mean: 175.670 ΞΌs (3.43% GC); median: 166.189 ΞΌs (0.00% GC); max: 3.846 ms (94.05% GC).

julia> @benchmark sol = solve($fastprob,ROCK4()) # 237.100 ΞΌs (1958 allocations: 191.64 KiB)
samples: 10000; evals/sample: 1; memory estimate: 191.62 KiB; allocs estimate: 1958
ns

 (200000.0 - 690000.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 9977
 (690000.0 - 1.17e6  ]   0
 (1.17e6   - 1.66e6  ]   0
 (1.66e6   - 2.14e6  ]   0
 (2.14e6   - 2.63e6  ]   0
 (2.63e6   - 3.11e6  ]   0
 (3.11e6   - 3.6e6   ]   0
 (3.6e6    - 4.08e6  ]   0
 (4.08e6   - 4.57e6  ]   0
 (4.57e6   - 5.06e6  ]   0
 (5.06e6   - 5.54e6  ]   0
 (5.54e6   - 6.03e6  ]   0
 (6.03e6   - 6.51e6  ]   0
 (6.51e6   - 7.0e6   ]   0
 (7.0e6    - 7.48e6  ]  ▏23

                  Counts

min: 199.522 ΞΌs (0.00% GC); mean: 220.741 ΞΌs (7.10% GC); median: 203.291 ΞΌs (0.00% GC); max: 7.485 ms (96.29% GC).

julia> # Implicit and Semi-Implicit Methods
       @benchmark sol = solve($fastprob,Rosenbrock23()) # 83.200 ΞΌs (541 allocations: 53.50 KiB)
samples: 10000; evals/sample: 1; memory estimate: 52.97 KiB; allocs estimate: 541
ns

 (79100.0  - 82100.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 5525
 (82100.0  - 85000.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž1138
 (85000.0  - 87900.0 ]  β–ˆβ–ˆβ–ˆβ–555
 (87900.0  - 90800.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–1715
 (90800.0  - 93800.0 ]  β–ˆβ–ˆβ–ˆβ–‰696
 (93800.0  - 96700.0 ]  β–ˆβ–236
 (96700.0  - 99600.0 ]  ▍49
 (99600.0  - 102500.0]  β–Ž25
 (102500.0 - 105500.0]  ▏17
 (105500.0 - 108400.0]  ▏8
 (108400.0 - 111300.0]  ▏5
 (111300.0 - 114200.0]  ▏7
 (114200.0 - 117200.0]  ▏4
 (117200.0 - 120100.0]  ▏10
 (120100.0 - 6.349e6 ]  ▏10

                  Counts

min: 79.133 ΞΌs (0.00% GC); mean: 87.742 ΞΌs (4.07% GC); median: 81.390 ΞΌs (0.00% GC); max: 6.349 ms (97.74% GC).

julia> @benchmark sol = solve($fastprob,TRBDF2()) # 72.400 ΞΌs (297 allocations: 31.72 KiB)
samples: 10000; evals/sample: 1; memory estimate: 30.98 KiB; allocs estimate: 295
ns

 (66400.0 - 68100.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 4508
 (68100.0 - 69700.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š1301
 (69700.0 - 71300.0 ]  β–ˆβ–ˆβ–ˆβ–‰570
 (71300.0 - 72900.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š1312
 (72900.0 - 74600.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ1349
 (74600.0 - 76200.0 ]  β–ˆβ–ˆβ–ˆβ–ˆ586
 (76200.0 - 77800.0 ]  β–ˆβ–Ž185
 (77800.0 - 79400.0 ]  β–Š96
 (79400.0 - 81100.0 ]  β–Œ58
 (81100.0 - 82700.0 ]  ▏17
 (82700.0 - 84300.0 ]  ▏4
 (84300.0 - 85900.0 ]  ▏1
 (85900.0 - 87600.0 ]   0
 (87600.0 - 89200.0 ]  ▏3
 (89200.0 - 6.4517e6]  ▏10

                  Counts

min: 66.446 ΞΌs (0.00% GC); mean: 72.016 ΞΌs (2.59% GC); median: 68.454 ΞΌs (0.00% GC); max: 6.452 ms (98.48% GC).

julia> @benchmark sol = solve($fastprob,KenCarp47()) # 110.500 ΞΌs (444 allocations: 33.02 KiB)
samples: 10000; evals/sample: 1; memory estimate: 32.28 KiB; allocs estimate: 442
ns

 (93800.0  - 95400.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 3879
 (95400.0  - 97100.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ2383
 (97100.0  - 98700.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–786
 (98700.0  - 100300.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ829
 (100300.0 - 101900.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–1069
 (101900.0 - 103500.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–Š614
 (103500.0 - 105100.0]  β–ˆβ–Š220
 (105100.0 - 106700.0]  β–Š94
 (106700.0 - 108300.0]  β–Œ63
 (108300.0 - 109900.0]  β–Ž32
 (109900.0 - 111500.0]  ▏14
 (111500.0 - 113100.0]  ▏4
 (113100.0 - 114800.0]  ▏1
 (114800.0 - 116400.0]  ▏2
 (116400.0 - 6.5823e6]  ▏10

                  Counts

min: 93.838 ΞΌs (0.00% GC); mean: 98.714 ΞΌs (1.29% GC); median: 95.878 ΞΌs (0.00% GC); max: 6.582 ms (97.31% GC).

julia> sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true);

julia> @benchmark sol = solve($sparseprob,Rosenbrock23()) # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
samples: 10000; evals/sample: 1; memory estimate: 60.00 KiB; allocs estimate: 576
ns

 (82100.0  - 85000.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ5107
 (85000.0  - 87900.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š1318
 (87900.0  - 90900.0 ]  β–ˆβ–ˆβ–ˆβ–512
 (90900.0  - 93800.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–‹779
 (93800.0  - 96700.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ1429
 (96700.0  - 99700.0 ]  β–ˆβ–ˆβ–ˆβ–560
 (99700.0  - 102600.0]  β–ˆβ–177
 (102600.0 - 105500.0]  ▍48
 (105500.0 - 108500.0]  ▏16
 (108500.0 - 111400.0]  ▏13
 (111400.0 - 114300.0]  ▏13
 (114300.0 - 117300.0]  ▏7
 (117300.0 - 120200.0]  ▏5
 (120200.0 - 123100.0]  ▏6
 (123100.0 - 6.7211e6]  ▏10

                  Counts

min: 82.072 ΞΌs (0.00% GC); mean: 91.952 ΞΌs (4.12% GC); median: 84.918 ΞΌs (0.00% GC); max: 6.721 ms (97.90% GC).

julia> @benchmark sol = solve($sparseprob,TRBDF2()) # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
samples: 10000; evals/sample: 1; memory estimate: 38.38 KiB; allocs estimate: 332
ns

 (69200.0 - 71000.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 3557
 (71000.0 - 72900.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š2215
 (72900.0 - 74700.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–Œ525
 (74700.0 - 76500.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–743
 (76500.0 - 78400.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š1264
 (78400.0 - 80200.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š1027
 (80200.0 - 82000.0 ]  β–ˆβ–ˆβ–ˆβ–Ž381
 (82000.0 - 83900.0 ]  β–ˆβ–Œ165
 (83900.0 - 85700.0 ]  β–‹74
 (85700.0 - 87500.0 ]  β–Ž20
 (87500.0 - 89400.0 ]  ▏9
 (89400.0 - 91200.0 ]  ▏1
 (91200.0 - 93000.0 ]  ▏3
 (93000.0 - 94900.0 ]  ▏6
 (94900.0 - 6.7441e6]  ▏10

                  Counts

min: 69.216 ΞΌs (0.00% GC); mean: 76.484 ΞΌs (3.37% GC); median: 71.754 ΞΌs (0.00% GC); max: 6.744 ms (98.18% GC).

julia> @benchmark sol = solve($sparseprob,KenCarp47()) # 346.400 ΞΌs (1757 allocations: 525.05 KiB)
samples: 10000; evals/sample: 1; memory estimate: 39.67 KiB; allocs estimate: 479
ns

 (97300.0  - 99100.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š2522
 (99100.0  - 100900.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 3480
 (100900.0 - 102700.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–932
 (102700.0 - 104600.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰673
 (104600.0 - 106400.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰903
 (106400.0 - 108200.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–826
 (108200.0 - 110000.0]  β–ˆβ–ˆβ–ˆβ–391
 (110000.0 - 111900.0]  β–ˆβ–148
 (111900.0 - 113700.0]  β–‹64
 (113700.0 - 115500.0]  β–Ž28
 (115500.0 - 117300.0]  β–Ž16
 (117300.0 - 119200.0]  ▏4
 (119200.0 - 121000.0]   0
 (121000.0 - 122800.0]  ▏3
 (122800.0 - 6.9148e6]  ▏10

                  Counts

min: 97.279 ΞΌs (0.00% GC); mean: 104.418 ΞΌs (2.50% GC); median: 100.061 ΞΌs (0.00% GC); max: 6.915 ms (97.43% GC).

julia> using Setfield, LinearAlgebra

julia> f = fastprob.f;

julia> newf = @set f.jac_prototype = Tridiagonal(sparseprob.f.jac_prototype);

julia> newf = @set newf.sparsity = Tridiagonal(sparseprob.f.sparsity);

julia> tridiagprob = ODEProblem(newf,x0,(0.0,5.0));

julia> @benchmark sol = solve($tridiagprob,Rosenbrock23()) # 188.000 ΞΌs (556 allocations: 66.19 KiB)
samples: 10000; evals/sample: 1; memory estimate: 60.91 KiB; allocs estimate: 501
ns

 (93000.0  - 95600.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 4781
 (95600.0  - 98100.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š1394
 (98100.0  - 100700.0]  β–ˆβ–ˆβ–ˆβ–494
 (100700.0 - 103200.0]  β–ˆ155
 (103200.0 - 105800.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ1019
 (105800.0 - 108300.0]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹1355
 (108300.0 - 110900.0]  β–ˆβ–ˆβ–ˆβ–Š581
 (110900.0 - 113400.0]  β–ˆβ–167
 (113400.0 - 116000.0]  β–Ž29
 (116000.0 - 118500.0]  ▏6
 (118500.0 - 121100.0]  ▏2
 (121100.0 - 123600.0]  ▏4
 (123600.0 - 126200.0]   0
 (126200.0 - 128700.0]  ▏3
 (128700.0 - 6.8546e6]  ▏10

                  Counts

min: 93.010 ΞΌs (0.00% GC); mean: 103.109 ΞΌs (3.80% GC); median: 95.750 ΞΌs (0.00% GC); max: 6.855 ms (98.14% GC).

julia> @benchmark sol = solve($tridiagprob,TRBDF2()) # 87.800 ΞΌs (338 allocations: 40.31 KiB)
samples: 10000; evals/sample: 1; memory estimate: 37.61 KiB; allocs estimate: 317
ns

 (54300.0 - 55900.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ4345
 (55900.0 - 57600.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ1434
 (57600.0 - 59300.0 ]  β–ˆβ–ˆβ–Ž315
 (59300.0 - 61000.0 ]  β–ˆβ–ˆβ–ˆβ–‰548
 (61000.0 - 62700.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š1411
 (62700.0 - 64400.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ1148
 (64400.0 - 66000.0 ]  β–ˆβ–ˆβ–ˆβ–449
 (66000.0 - 67700.0 ]  β–ˆβ–188
 (67700.0 - 69400.0 ]  β–Š96
 (69400.0 - 71100.0 ]  ▍37
 (71100.0 - 72800.0 ]  ▏12
 (72800.0 - 74500.0 ]  ▏3
 (74500.0 - 76100.0 ]   0
 (76100.0 - 77800.0 ]  ▏4
 (77800.0 - 6.9987e6]  ▏10

                  Counts

min: 54.263 ΞΌs (0.00% GC); mean: 61.268 ΞΌs (4.35% GC); median: 56.349 ΞΌs (0.00% GC); max: 6.999 ms (98.13% GC).

julia> @benchmark sol = solve($tridiagprob,KenCarp47()) # 133.400 ΞΌs (482 allocations: 42.16 KiB)
samples: 10000; evals/sample: 1; memory estimate: 39.14 KiB; allocs estimate: 457
ns

 (80500.0  - 82500.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 4414
 (82500.0  - 84500.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ1977
 (84500.0  - 86500.0 ]  β–ˆβ–ˆβ–ˆβ–‰558
 (86500.0  - 88500.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰856
 (88500.0  - 90500.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–1185
 (90500.0  - 92500.0 ]  β–ˆβ–ˆβ–ˆβ–ˆβ–Ž619
 (92500.0  - 94600.0 ]  β–ˆβ–Š246
 (94600.0  - 96600.0 ]  β–‹89
 (96600.0  - 98600.0 ]  β–Ž34
 (98600.0  - 100600.0]  ▏5
 (100600.0 - 102600.0]  ▏2
 (102600.0 - 104600.0]  ▏2
 (104600.0 - 106600.0]   0
 (106600.0 - 108600.0]  ▏3
 (108600.0 - 7.0086e6]  ▏10

                  Counts

min: 80.466 ΞΌs (0.00% GC); mean: 87.458 ΞΌs (3.05% GC); median: 82.759 ΞΌs (0.00% GC); max: 7.009 ms (97.91% GC).

The other issue reporting a regression used a Skylake-X Xeon CPU, which is more or less the same as mine. Two possible causes there:

  1. Their config was dual socket, mine is single socket. Maybe there’s an issue with multi-socket systems.
  2. Starting in Julia 1.6 (with LLVM >=10, 1.6 ships with LLVM 11, 1.5 with LLVM 9), LLVM will use 256-bit vectors on AVX512 systems (note: LoopVectorization will continue to use 512-bit vectors). I override this by default (starting Julia with -C"native,-prefer-256-bit" to turn off the preference for 256-bit). I did try running the benchmarks with the default settings. This caused around a 5% hit to performance, nothing like the one reported in that issue so I didn’t bring it up there as it couldn’t explain the reported <1.6 v 1.6 discrepency.

But neither of these are the case with Deepa’s skylake CPU (single socket, no AVX512).

FastBroadcast.jl just relies on LLVM to optimize loops, just like base broadcasting and the old @.., but should be easier on LLVM.

I think someone experiencing bad performance should profile, looking for anything taking a lot more time in 1.6 than in <1.6, and report with that example.

1 Like

Out of curiosity I ran the code on my 2017 MacBook Pro and found times comparable to Chris’s.

23.593 ΞΌs (267 allocations: 40.45 KiB)
28.850 ΞΌs (278 allocations: 34.07 KiB)
28.965 ΞΌs (284 allocations: 49.77 KiB)

234.572 ΞΌs (1020 allocations: 161.07 KiB)
327.884 ΞΌs (1986 allocations: 191.86 KiB)

123.919 ΞΌs (637 allocations: 54.28 KiB)
108.163 ΞΌs (467 allocations: 33.05 KiB)
147.139 ΞΌs (669 allocations: 34.77 KiB)

130.978 ΞΌs (671 allocations: 60.78 KiB)
117.362 ΞΌs (503 allocations: 39.83 KiB)
155.347 ΞΌs (705 allocations: 41.55 KiB)

164.535 ΞΌs (596 allocations: 61.69 KiB)
102.793 ΞΌs (488 allocations: 39.06 KiB)
145.441 ΞΌs (683 allocations: 41.02 KiB)

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Coreβ„’ i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 4

# https://scicomp.stackexchange.com/questions/37440/specifying-ode-solver-options-to-speed-up-compute-time

using DifferentialEquations, BenchmarkTools

mat1=[
       1    -2     1     0     0     0     0     0     0     0;
       0     1    -2     1     0     0     0     0     0     0;
       0     0     1    -2     1     0     0     0     0     0;
       0     0     0     1    -2     1     0     0     0     0;
       0     0     0     0     1    -2     1     0     0     0;
       0     0     0     0     0     1    -2     1     0     0;
       0     0     0     0     0     0     1    -2     1     0;
       0     0     0     0     0     0     0     1    -2     1;
       ];

mat2 = [
        1    -1     0     0     0     0     0     0     0     0;
        0     1    -1     0     0     0     0     0     0     0;
        0     0     1    -1     0     0     0     0     0     0;
        0     0     0     1    -1     0     0     0     0     0;
        0     0     0     0     1    -1     0     0     0     0;
        0     0     0     0     0     1    -1     0     0     0;
        0     0     0     0     0     0     1    -1     0     0;
        0     0     0     0     0     0     0     1    -1     0;
        ];

x0 = [1.0,0,0,0,0,0,0,0,0,0]
saveat = 0:0.01:5

function fun(dx,x,p,t)
    dx[1,:] .= 0
    dx[2:9,:] .= mat1*x + mat2*x
    dx[10,:] .= 2*(x[end-1] - x[end])
end

prob = ODEProblem(fun,x0,(0.0,5.0))
sys = modelingtoolkitize(prob)
fastprob = ODEProblem(sys,x0,(0.0,5.0),jac=true)

# Explicit RK Methods
@btime sol = solve(fastprob,Tsit5()) # 16.700 ΞΌs (245 allocations: 40.28 KiB)
@btime sol = solve(fastprob,BS3()) # 19.800 ΞΌs (231 allocations: 33.70 KiB)
@btime sol = solve(fastprob,Vern7()) # 18.400 ΞΌs (266 allocations: 49.62 KiB)

# Stabilized-Explicit RK Methods
@btime sol = solve(fastprob,ROCK2()) # 173.300 ΞΌs (831 allocations: 159.59 KiB)
@btime sol = solve(fastprob,ROCK4()) # 237.100 ΞΌs (1958 allocations: 191.64 KiB)

# Implicit and Semi-Implicit Methods
@btime sol = solve(fastprob,Rosenbrock23()) # 83.200 ΞΌs (541 allocations: 53.50 KiB)
@btime sol = solve(fastprob,TRBDF2()) # 72.400 ΞΌs (297 allocations: 31.72 KiB)
@btime sol = solve(fastprob,KenCarp47()) # 110.500 ΞΌs (444 allocations: 33.02 KiB)

sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true)
@btime sol = solve(sparseprob,Rosenbrock23()) # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
@btime sol = solve(sparseprob,TRBDF2()) # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
@btime sol = solve(sparseprob,KenCarp47()) # 346.400 ΞΌs (1757 allocations: 525.05 KiB)

using Setfield, LinearAlgebra
f = fastprob.f
newf = @set f.jac_prototype = Tridiagonal(sparseprob.f.jac_prototype)
newf = @set newf.sparsity = Tridiagonal(sparseprob.f.sparsity)
tridiagprob = ODEProblem(newf,x0,(0.0,5.0))
@btime sol = solve(tridiagprob,Rosenbrock23()) # 188.000 ΞΌs (556 allocations: 66.19 KiB)
@btime sol = solve(tridiagprob,TRBDF2()) # 87.800 ΞΌs (338 allocations: 40.31 KiB)
@btime sol = solve(tridiagprob,KenCarp47()) # 133.400 ΞΌs (482 allocations: 42.16 KiB)

versioninfo()

It would be good to have a few other people run it too, see if we can find a pattern in the CPUs or something.

2 Likes

And also Base.JLOptions(), to make sure there aren’t any weird settings?

Hi @ChrisRackauckas ,

I tried the same code in Juno and I observe the following

19.200 ΞΌs (245 allocations: 40.27 KiB)
  24.100 ΞΌs (231 allocations: 33.69 KiB)
  22.100 ΞΌs (266 allocations: 49.61 KiB)
  204.200 ΞΌs (831 allocations: 159.58 KiB)
  277.700 ΞΌs (1958 allocations: 191.62 KiB)
  5.259 ms (10836 allocations: 301.75 KiB)
  1.501 ms (3144 allocations: 100.36 KiB)
  1.985 ms (4167 allocations: 122.78 KiB)
  5.275 ms (10870 allocations: 308.25 KiB)
  1.515 ms (3180 allocations: 107.14 KiB)
  1.998 ms (4203 allocations: 129.56 KiB)
  127.900 ΞΌs (502 allocations: 60.94 KiB)
  72.500 ΞΌs (318 allocations: 37.72 KiB)
  107.700 ΞΌs (458 allocations: 39.25 KiB)

I still observe ms for Implicit and Semi-Implicit Methods and while running sparseprob . The other runtimes are more or less close to what was posted by you.

I ran the code on my home computer set-up (16GB RAM):

Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9900T CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

The results are (REPL):

julia> # Explicit RK Methods
julia> @btime sol = solve(fastprob,Tsit5()); # 16.700 ΞΌs (245 allocations: 40.28 KiB)
  16.100 ΞΌs (245 allocations: 40.28 KiB)
julia> @btime sol = solve(fastprob,BS3()); # 19.800 ΞΌs (231 allocations: 33.70 KiB)
  20.200 ΞΌs (231 allocations: 33.70 KiB)
julia> @btime sol = solve(fastprob,Vern7()); # 18.400 ΞΌs (266 allocations: 49.62 KiB)
  18.100 ΞΌs (266 allocations: 49.62 KiB)

julia> # Stabilized-Explicit RK Methods
julia> @btime sol = solve(fastprob,ROCK2()); # 173.300 ΞΌs (831 allocations: 159.59 KiB)
  170.200 ΞΌs (831 allocations: 159.59 KiB)
julia> @btime sol = solve(fastprob,ROCK4()); # 237.100 ΞΌs (1958 allocations: 191.64 KiB)
  212.100 ΞΌs (1958 allocations: 191.64 KiB)

julia> # Implicit and Semi-Implicit Methods
julia> @btime sol = solve(fastprob,Rosenbrock23()); # 83.200 ΞΌs (541 allocations: 53.50 KiB)
  82.100 ΞΌs (543 allocations: 53.55 KiB)
julia> @btime sol = solve(fastprob,TRBDF2()); # 72.400 ΞΌs (297 allocations: 31.72 KiB)
  69.100 ΞΌs (297 allocations: 31.72 KiB)
julia> @btime sol = solve(fastprob,KenCarp47()); # 110.500 ΞΌs (444 allocations: 33.02 KiB)
  98.600 ΞΌs (444 allocations: 33.02 KiB)

julia> # Sparse problems
julia> @btime sol = solve(sparseprob,Rosenbrock23()); # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
  86.900 ΞΌs (577 allocations: 60.05 KiB)
julia> @btime sol = solve(sparseprob,TRBDF2()); # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
  76.200 ΞΌs (333 allocations: 38.50 KiB)
julia> @btime sol = solve(sparseprob,KenCarp47()); # 346.400 ΞΌs (1757 allocations: 525.05 KiB)
  100.800 ΞΌs (480 allocations: 39.80 KiB)

julia> # Tridiagonal problems 
julia> @btime sol = solve(tridiagprob,Rosenbrock23()); # 188.000 ΞΌs (556 allocations: 66.19 KiB)
  108.900 ΞΌs (502 allocations: 60.95 KiB)
julia> @btime sol = solve(tridiagprob,TRBDF2()); # 87.800 ΞΌs (338 allocations: 40.31 KiB)
  61.900 ΞΌs (318 allocations: 37.73 KiB)
julia> @btime sol = solve(tridiagprob,KenCarp47()); # 133.400 ΞΌs (482 allocations: 42.16
  88.000 ΞΌs (458 allocations: 39.27 KiB)

So the issue is PyCharm? Can you share what Base.JLOptions() gives from PyCharm?

@ChrisRackauckas Sure, I post what Base.JLOptions() gives soon. I’m facing some issue in opening REPL in PyCharm.

Also, in Juno I’m not sure why Implicit and Semi-Implicit Methods take long time for my run.

Are your BLAS threads one or something?