# Specifying ode solver options to speed up compute time

Hello All,

This is a follow up to my question posted here.

Iβve installed Julia in windows and I am running the following code (posted in the above-mentioned link) via the Julia plugin instaled in Pycharm.

``````using DifferentialEquations, BenchmarkTools

mat1=[
1    -2     1     0     0     0     0     0     0     0;
0     1    -2     1     0     0     0     0     0     0;
0     0     1    -2     1     0     0     0     0     0;
0     0     0     1    -2     1     0     0     0     0;
0     0     0     0     1    -2     1     0     0     0;
0     0     0     0     0     1    -2     1     0     0;
0     0     0     0     0     0     1    -2     1     0;
0     0     0     0     0     0     0     1    -2     1;
];

mat2 = [
1    -1     0     0     0     0     0     0     0     0;
0     1    -1     0     0     0     0     0     0     0;
0     0     1    -1     0     0     0     0     0     0;
0     0     0     1    -1     0     0     0     0     0;
0     0     0     0     1    -1     0     0     0     0;
0     0     0     0     0     1    -1     0     0     0;
0     0     0     0     0     0     1    -1     0     0;
0     0     0     0     0     0     0     1    -1     0;
];

x0 = [1.0,0,0,0,0,0,0,0,0,0]
saveat = 0:0.01:5

function fun(dx,x,p,t)
dx[1,:] .= 0
dx[2:9,:] .= mat1*x + mat2*x
dx[10,:] .= 2*(x[end-1] - x[end])
end

prob = ODEProblem(fun,x0,(0.0,5.0))
sys = modelingtoolkitize(prob)
fastprob = ODEProblem(sys,x0,(0.0,5.0),jac=true)

# Explicit RK Methods
@btime sol = solve(fastprob,Tsit5()) # 16.700 ΞΌs (245 allocations: 40.28 KiB)
@btime sol = solve(fastprob,BS3()) # 19.800 ΞΌs (231 allocations: 33.70 KiB)
@btime sol = solve(fastprob,Vern7()) # 18.400 ΞΌs (266 allocations: 49.62 KiB)

# Stabilized-Explicit RK Methods
@btime sol = solve(fastprob,ROCK2()) # 173.300 ΞΌs (831 allocations: 159.59 KiB)
@btime sol = solve(fastprob,ROCK4()) # 237.100 ΞΌs (1958 allocations: 191.64 KiB)

# Implicit and Semi-Implicit Methods
@btime sol = solve(fastprob,Rosenbrock23()) # 83.200 ΞΌs (541 allocations: 53.50 KiB)
@btime sol = solve(fastprob,TRBDF2()) # 72.400 ΞΌs (297 allocations: 31.72 KiB)
@btime sol = solve(fastprob,KenCarp47()) # 110.500 ΞΌs (444 allocations: 33.02 KiB)

sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true)
@btime sol = solve(sparseprob,Rosenbrock23()) # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
@btime sol = solve(sparseprob,TRBDF2()) # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
@btime sol = solve(sparseprob,KenCarp47()) # 346.400 ΞΌs (1757 allocations: 525.05 KiB)

using Setfield, LinearAlgebra
f = fastprob.f
newf = @set f.jac_prototype = Tridiagonal(sparseprob.f.jac_prototype)
newf = @set newf.sparsity = Tridiagonal(sparseprob.f.sparsity)
tridiagprob = ODEProblem(newf,x0,(0.0,5.0))
@btime sol = solve(tridiagprob,Rosenbrock23()) # 188.000 ΞΌs (556 allocations: 66.19 KiB)
@btime sol = solve(tridiagprob,TRBDF2()) # 87.800 ΞΌs (338 allocations: 40.31 KiB)
@btime sol = solve(tridiagprob,KenCarp47()) # 133.400 ΞΌs (482 allocations: 42.16 KiB)
``````

Iβm not sure why but I observe time in ms and not in microseconds as posted by @ChrisRackauckas on SE.

``````  2.847 ms (34957 allocations: 2.63 MiB)
2.985 ms (31647 allocations: 1.73 MiB)
3.721 ms (46812 allocations: 5.47 MiB)
22.349 ms (268252 allocations: 10.67 MiB)
5.789 ms (80043 allocations: 3.01 MiB)
78.085 ms (605234 allocations: 20.99 MiB)
25.611 ms (217583 allocations: 7.47 MiB)
33.834 ms (288531 allocations: 10.52 MiB)
74.903 ms (597849 allocations: 20.82 MiB)
25.356 ms (210161 allocations: 7.29 MiB)
34.504 ms (281243 allocations: 10.33 MiB)
11.809 ms (156756 allocations: 4.92 MiB)
7.047 ms (72883 allocations: 2.52 MiB)
9.795 ms (103225 allocations: 4.15 MiB)
``````

Could someone help me in understanding why it takes ms and not microseconds when I run the code ?

Thanks so much

The big thing that will slow this down is that you arenβt taking advantage of the structure of your matrices (and they are non-`const` globals). If you use `BandedMatrices` for `mat1` and `mat2`, and make them `const` it should be a massive improvement.

Julia v1.6.1? What happens if you run it in the REPL?

Hi @Oscar_Smith , Thanks for the suggestion. Since the compute time reported in
in microseconds ( the link shared above) was without the use of `BandedMatrices `, I wasnβt sure why I am not able to observe the same compute time reported by @ChrisRackauckas.

Thatβs not needed because itβs already further optimizing beyond that using the `modelingtoolkitize` to fully scalarize it.

Yes , Iβm using Julia 1.6.1. Iβm sorry, Iβm really new to Julia and I couldnβt figure out what REPL means. If you donβt mind, could you please direct me to a tutorial on how to run via REPL?

I think you meant the terminal window of the julia executable. Iβll give it a try. Thank you

in REPL, should I do julia file.jl?

Iβm not doing anything special. Iβm just doing it the standard way in Juno.

1 Like

Can you share `versioninfo()` @Deepa ?

@ChrisRackauckas Please find the version info below

``````Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
``````

@Elrod are you able to reproduce this one? I wonder if thereβs some weird thing FastBroadcast.jl does to some chips.

No.

``````julia> using DifferentialEquations, BenchmarkHistograms

julia> mat1=[
1    -2     1     0     0     0     0     0     0     0;
0     1    -2     1     0     0     0     0     0     0;
0     0     1    -2     1     0     0     0     0     0;
0     0     0     1    -2     1     0     0     0     0;
0     0     0     0     1    -2     1     0     0     0;
0     0     0     0     0     1    -2     1     0     0;
0     0     0     0     0     0     1    -2     1     0;
0     0     0     0     0     0     0     1    -2     1;
];

julia> mat2 = [
1    -1     0     0     0     0     0     0     0     0;
0     1    -1     0     0     0     0     0     0     0;
0     0     1    -1     0     0     0     0     0     0;
0     0     0     1    -1     0     0     0     0     0;
0     0     0     0     1    -1     0     0     0     0;
0     0     0     0     0     1    -1     0     0     0;
0     0     0     0     0     0     1    -1     0     0;
0     0     0     0     0     0     0     1    -1     0;
];

julia> x0 = [1.0,0,0,0,0,0,0,0,0,0];

julia> saveat = 0:0.01:5;

julia> function fun(dx,x,p,t)
dx[1,:] .= 0
dx[2:9,:] .= mat1*x + mat2*x
dx[10,:] .= 2*(x[end-1] - x[end])
end
fun (generic function with 1 method)

julia> prob = ODEProblem(fun,x0,(0.0,5.0));

julia> sys = modelingtoolkitize(prob);

julia> fastprob = ODEProblem(sys,x0,(0.0,5.0),jac=true);

julia> # Explicit RK Methods
@benchmark sol = solve(\$fastprob,Tsit5()) # 16.700 ΞΌs (245 allocations: 40.28 KiB)
samples: 10000; evals/sample: 1; memory estimate: 39.20 KiB; allocs estimate: 244
ns

(14200.0 - 15900.0 ]  ββββββββββββββββββββββββββββββ 5365
(15900.0 - 17600.0 ]  ββ245
(17600.0 - 19400.0 ]  β68
(19400.0 - 21100.0 ]  ββ344
(21100.0 - 22800.0 ]  ββββββββββ1700
(22800.0 - 24500.0 ]  βββββββββ1518
(24500.0 - 26200.0 ]  βββ513
(26200.0 - 27900.0 ]  ββ181
(27900.0 - 29600.0 ]  β48
(29600.0 - 31300.0 ]  β2
(31300.0 - 33000.0 ]  β2
(33000.0 - 34700.0 ]  β2
(34700.0 - 36400.0 ]  β1
(36400.0 - 38100.0 ]  β1
(38100.0 - 4.4385e6]  β10

Counts

min: 14.226 ΞΌs (0.00% GC); mean: 20.654 ΞΌs (10.28% GC); median: 15.399 ΞΌs (0.00% GC); max: 4.438 ms (99.04% GC).

julia> @benchmark sol = solve(\$fastprob,BS3()) # 19.800 ΞΌs (231 allocations: 33.70 KiB)
samples: 10000; evals/sample: 1; memory estimate: 33.03 KiB; allocs estimate: 230
ns

(17900.0 - 19600.0 ]  ββββββββββββββββββββββββββββββ 5474
(19600.0 - 21200.0 ]  ββ305
(21200.0 - 22800.0 ]  β86
(22800.0 - 24400.0 ]  ββββββββββ1661
(24400.0 - 26100.0 ]  ββββββββββ1680
(26100.0 - 27700.0 ]  βββ518
(27700.0 - 29300.0 ]  β144
(29300.0 - 31000.0 ]  β88
(31000.0 - 32600.0 ]  β30
(32600.0 - 34200.0 ]  β1
(34200.0 - 35900.0 ]   0
(35900.0 - 37500.0 ]   0
(37500.0 - 39100.0 ]   0
(39100.0 - 40700.0 ]  β3
(40700.0 - 4.8166e6]  β10

Counts

min: 17.927 ΞΌs (0.00% GC); mean: 23.190 ΞΌs (7.90% GC); median: 18.893 ΞΌs (0.00% GC); max: 4.817 ms (99.08% GC).

julia> @benchmark sol = solve(\$fastprob,Vern7()) # 18.400 ΞΌs (266 allocations: 49.62 KiB)
samples: 10000; evals/sample: 1; memory estimate: 47.36 KiB; allocs estimate: 265
ns

(15800.0 - 18200.0 ]  ββββββββββββββββββββββββββββββ6001
(18200.0 - 20700.0 ]  ββ341
(20700.0 - 23100.0 ]  β63
(23100.0 - 25600.0 ]  βββββββ1280
(25600.0 - 28000.0 ]  ββββββββββ1824
(28000.0 - 30500.0 ]  ββ360
(30500.0 - 32900.0 ]  β102
(32900.0 - 35400.0 ]  β11
(35400.0 - 37800.0 ]   0
(37800.0 - 40300.0 ]   0
(40300.0 - 42700.0 ]  β4
(42700.0 - 45200.0 ]  β3
(45200.0 - 47600.0 ]   0
(47600.0 - 50100.0 ]  β1
(50100.0 - 3.9094e6]  β10

Counts

min: 15.794 ΞΌs (0.00% GC); mean: 22.292 ΞΌs (8.35% GC); median: 17.203 ΞΌs (0.00% GC); max: 3.909 ms (98.93% GC).

julia> # Stabilized-Explicit RK Methods
@benchmark sol = solve(\$fastprob,ROCK2()) # 173.300 ΞΌs (831 allocations: 159.59 KiB)
samples: 10000; evals/sample: 1; memory estimate: 158.97 KiB; allocs estimate: 830
ns

(160000.0 - 390000.0]  ββββββββββββββββββββββββββββββ 9980
(390000.0 - 620000.0]  β1
(620000.0 - 840000.0]   0
(840000.0 - 1.07e6  ]   0
(1.07e6   - 1.3e6   ]   0
(1.3e6    - 1.52e6  ]   0
(1.52e6   - 1.75e6  ]   0
(1.75e6   - 1.98e6  ]   0
(1.98e6   - 2.2e6   ]   0
(2.2e6    - 2.43e6  ]   0
(2.43e6   - 2.66e6  ]   0
(2.66e6   - 2.88e6  ]   0
(2.88e6   - 3.11e6  ]   0
(3.11e6   - 3.34e6  ]  β9
(3.34e6   - 3.85e6  ]  β10

Counts

min: 161.643 ΞΌs (0.00% GC); mean: 175.670 ΞΌs (3.43% GC); median: 166.189 ΞΌs (0.00% GC); max: 3.846 ms (94.05% GC).

julia> @benchmark sol = solve(\$fastprob,ROCK4()) # 237.100 ΞΌs (1958 allocations: 191.64 KiB)
samples: 10000; evals/sample: 1; memory estimate: 191.62 KiB; allocs estimate: 1958
ns

(200000.0 - 690000.0]  ββββββββββββββββββββββββββββββ 9977
(690000.0 - 1.17e6  ]   0
(1.17e6   - 1.66e6  ]   0
(1.66e6   - 2.14e6  ]   0
(2.14e6   - 2.63e6  ]   0
(2.63e6   - 3.11e6  ]   0
(3.11e6   - 3.6e6   ]   0
(3.6e6    - 4.08e6  ]   0
(4.08e6   - 4.57e6  ]   0
(4.57e6   - 5.06e6  ]   0
(5.06e6   - 5.54e6  ]   0
(5.54e6   - 6.03e6  ]   0
(6.03e6   - 6.51e6  ]   0
(6.51e6   - 7.0e6   ]   0
(7.0e6    - 7.48e6  ]  β23

Counts

min: 199.522 ΞΌs (0.00% GC); mean: 220.741 ΞΌs (7.10% GC); median: 203.291 ΞΌs (0.00% GC); max: 7.485 ms (96.29% GC).

julia> # Implicit and Semi-Implicit Methods
@benchmark sol = solve(\$fastprob,Rosenbrock23()) # 83.200 ΞΌs (541 allocations: 53.50 KiB)
samples: 10000; evals/sample: 1; memory estimate: 52.97 KiB; allocs estimate: 541
ns

(79100.0  - 82100.0 ]  ββββββββββββββββββββββββββββββ 5525
(82100.0  - 85000.0 ]  βββββββ1138
(85000.0  - 87900.0 ]  ββββ555
(87900.0  - 90800.0 ]  ββββββββββ1715
(90800.0  - 93800.0 ]  ββββ696
(93800.0  - 96700.0 ]  ββ236
(96700.0  - 99600.0 ]  β49
(99600.0  - 102500.0]  β25
(102500.0 - 105500.0]  β17
(105500.0 - 108400.0]  β8
(108400.0 - 111300.0]  β5
(111300.0 - 114200.0]  β7
(114200.0 - 117200.0]  β4
(117200.0 - 120100.0]  β10
(120100.0 - 6.349e6 ]  β10

Counts

min: 79.133 ΞΌs (0.00% GC); mean: 87.742 ΞΌs (4.07% GC); median: 81.390 ΞΌs (0.00% GC); max: 6.349 ms (97.74% GC).

julia> @benchmark sol = solve(\$fastprob,TRBDF2()) # 72.400 ΞΌs (297 allocations: 31.72 KiB)
samples: 10000; evals/sample: 1; memory estimate: 30.98 KiB; allocs estimate: 295
ns

(66400.0 - 68100.0 ]  ββββββββββββββββββββββββββββββ 4508
(68100.0 - 69700.0 ]  βββββββββ1301
(69700.0 - 71300.0 ]  ββββ570
(71300.0 - 72900.0 ]  βββββββββ1312
(72900.0 - 74600.0 ]  βββββββββ1349
(74600.0 - 76200.0 ]  ββββ586
(76200.0 - 77800.0 ]  ββ185
(77800.0 - 79400.0 ]  β96
(79400.0 - 81100.0 ]  β58
(81100.0 - 82700.0 ]  β17
(82700.0 - 84300.0 ]  β4
(84300.0 - 85900.0 ]  β1
(85900.0 - 87600.0 ]   0
(87600.0 - 89200.0 ]  β3
(89200.0 - 6.4517e6]  β10

Counts

min: 66.446 ΞΌs (0.00% GC); mean: 72.016 ΞΌs (2.59% GC); median: 68.454 ΞΌs (0.00% GC); max: 6.452 ms (98.48% GC).

julia> @benchmark sol = solve(\$fastprob,KenCarp47()) # 110.500 ΞΌs (444 allocations: 33.02 KiB)
samples: 10000; evals/sample: 1; memory estimate: 32.28 KiB; allocs estimate: 442
ns

(93800.0  - 95400.0 ]  ββββββββββββββββββββββββββββββ 3879
(95400.0  - 97100.0 ]  βββββββββββββββββββ2383
(97100.0  - 98700.0 ]  βββββββ786
(98700.0  - 100300.0]  βββββββ829
(100300.0 - 101900.0]  βββββββββ1069
(101900.0 - 103500.0]  βββββ614
(103500.0 - 105100.0]  ββ220
(105100.0 - 106700.0]  β94
(106700.0 - 108300.0]  β63
(108300.0 - 109900.0]  β32
(109900.0 - 111500.0]  β14
(111500.0 - 113100.0]  β4
(113100.0 - 114800.0]  β1
(114800.0 - 116400.0]  β2
(116400.0 - 6.5823e6]  β10

Counts

min: 93.838 ΞΌs (0.00% GC); mean: 98.714 ΞΌs (1.29% GC); median: 95.878 ΞΌs (0.00% GC); max: 6.582 ms (97.31% GC).

julia> sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true);

julia> @benchmark sol = solve(\$sparseprob,Rosenbrock23()) # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
samples: 10000; evals/sample: 1; memory estimate: 60.00 KiB; allocs estimate: 576
ns

(82100.0  - 85000.0 ]  ββββββββββββββββββββββββββββββ5107
(85000.0  - 87900.0 ]  ββββββββ1318
(87900.0  - 90900.0 ]  ββββ512
(90900.0  - 93800.0 ]  βββββ779
(93800.0  - 96700.0 ]  βββββββββ1429
(96700.0  - 99700.0 ]  ββββ560
(99700.0  - 102600.0]  ββ177
(102600.0 - 105500.0]  β48
(105500.0 - 108500.0]  β16
(108500.0 - 111400.0]  β13
(111400.0 - 114300.0]  β13
(114300.0 - 117300.0]  β7
(117300.0 - 120200.0]  β5
(120200.0 - 123100.0]  β6
(123100.0 - 6.7211e6]  β10

Counts

min: 82.072 ΞΌs (0.00% GC); mean: 91.952 ΞΌs (4.12% GC); median: 84.918 ΞΌs (0.00% GC); max: 6.721 ms (97.90% GC).

julia> @benchmark sol = solve(\$sparseprob,TRBDF2()) # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
samples: 10000; evals/sample: 1; memory estimate: 38.38 KiB; allocs estimate: 332
ns

(69200.0 - 71000.0 ]  ββββββββββββββββββββββββββββββ 3557
(71000.0 - 72900.0 ]  βββββββββββββββββββ2215
(72900.0 - 74700.0 ]  βββββ525
(74700.0 - 76500.0 ]  βββββββ743
(76500.0 - 78400.0 ]  βββββββββββ1264
(78400.0 - 80200.0 ]  βββββββββ1027
(80200.0 - 82000.0 ]  ββββ381
(82000.0 - 83900.0 ]  ββ165
(83900.0 - 85700.0 ]  β74
(85700.0 - 87500.0 ]  β20
(87500.0 - 89400.0 ]  β9
(89400.0 - 91200.0 ]  β1
(91200.0 - 93000.0 ]  β3
(93000.0 - 94900.0 ]  β6
(94900.0 - 6.7441e6]  β10

Counts

min: 69.216 ΞΌs (0.00% GC); mean: 76.484 ΞΌs (3.37% GC); median: 71.754 ΞΌs (0.00% GC); max: 6.744 ms (98.18% GC).

julia> @benchmark sol = solve(\$sparseprob,KenCarp47()) # 346.400 ΞΌs (1757 allocations: 525.05 KiB)
samples: 10000; evals/sample: 1; memory estimate: 39.67 KiB; allocs estimate: 479
ns

(97300.0  - 99100.0 ]  ββββββββββββββββββββββ2522
(99100.0  - 100900.0]  ββββββββββββββββββββββββββββββ 3480
(100900.0 - 102700.0]  βββββββββ932
(102700.0 - 104600.0]  ββββββ673
(104600.0 - 106400.0]  ββββββββ903
(106400.0 - 108200.0]  ββββββββ826
(108200.0 - 110000.0]  ββββ391
(110000.0 - 111900.0]  ββ148
(111900.0 - 113700.0]  β64
(113700.0 - 115500.0]  β28
(115500.0 - 117300.0]  β16
(117300.0 - 119200.0]  β4
(119200.0 - 121000.0]   0
(121000.0 - 122800.0]  β3
(122800.0 - 6.9148e6]  β10

Counts

min: 97.279 ΞΌs (0.00% GC); mean: 104.418 ΞΌs (2.50% GC); median: 100.061 ΞΌs (0.00% GC); max: 6.915 ms (97.43% GC).

julia> using Setfield, LinearAlgebra

julia> f = fastprob.f;

julia> newf = @set f.jac_prototype = Tridiagonal(sparseprob.f.jac_prototype);

julia> newf = @set newf.sparsity = Tridiagonal(sparseprob.f.sparsity);

julia> tridiagprob = ODEProblem(newf,x0,(0.0,5.0));

julia> @benchmark sol = solve(\$tridiagprob,Rosenbrock23()) # 188.000 ΞΌs (556 allocations: 66.19 KiB)
samples: 10000; evals/sample: 1; memory estimate: 60.91 KiB; allocs estimate: 501
ns

(93000.0  - 95600.0 ]  ββββββββββββββββββββββββββββββ 4781
(95600.0  - 98100.0 ]  βββββββββ1394
(98100.0  - 100700.0]  ββββ494
(100700.0 - 103200.0]  β155
(103200.0 - 105800.0]  βββββββ1019
(105800.0 - 108300.0]  βββββββββ1355
(108300.0 - 110900.0]  ββββ581
(110900.0 - 113400.0]  ββ167
(113400.0 - 116000.0]  β29
(116000.0 - 118500.0]  β6
(118500.0 - 121100.0]  β2
(121100.0 - 123600.0]  β4
(123600.0 - 126200.0]   0
(126200.0 - 128700.0]  β3
(128700.0 - 6.8546e6]  β10

Counts

min: 93.010 ΞΌs (0.00% GC); mean: 103.109 ΞΌs (3.80% GC); median: 95.750 ΞΌs (0.00% GC); max: 6.855 ms (98.14% GC).

julia> @benchmark sol = solve(\$tridiagprob,TRBDF2()) # 87.800 ΞΌs (338 allocations: 40.31 KiB)
samples: 10000; evals/sample: 1; memory estimate: 37.61 KiB; allocs estimate: 317
ns

(54300.0 - 55900.0 ]  ββββββββββββββββββββββββββββββ4345
(55900.0 - 57600.0 ]  ββββββββββ1434
(57600.0 - 59300.0 ]  βββ315
(59300.0 - 61000.0 ]  ββββ548
(61000.0 - 62700.0 ]  ββββββββββ1411
(62700.0 - 64400.0 ]  ββββββββ1148
(64400.0 - 66000.0 ]  ββββ449
(66000.0 - 67700.0 ]  ββ188
(67700.0 - 69400.0 ]  β96
(69400.0 - 71100.0 ]  β37
(71100.0 - 72800.0 ]  β12
(72800.0 - 74500.0 ]  β3
(74500.0 - 76100.0 ]   0
(76100.0 - 77800.0 ]  β4
(77800.0 - 6.9987e6]  β10

Counts

min: 54.263 ΞΌs (0.00% GC); mean: 61.268 ΞΌs (4.35% GC); median: 56.349 ΞΌs (0.00% GC); max: 6.999 ms (98.13% GC).

julia> @benchmark sol = solve(\$tridiagprob,KenCarp47()) # 133.400 ΞΌs (482 allocations: 42.16 KiB)
samples: 10000; evals/sample: 1; memory estimate: 39.14 KiB; allocs estimate: 457
ns

(80500.0  - 82500.0 ]  ββββββββββββββββββββββββββββββ 4414
(82500.0  - 84500.0 ]  ββββββββββββββ1977
(84500.0  - 86500.0 ]  ββββ558
(86500.0  - 88500.0 ]  ββββββ856
(88500.0  - 90500.0 ]  βββββββββ1185
(90500.0  - 92500.0 ]  βββββ619
(92500.0  - 94600.0 ]  ββ246
(94600.0  - 96600.0 ]  β89
(96600.0  - 98600.0 ]  β34
(98600.0  - 100600.0]  β5
(100600.0 - 102600.0]  β2
(102600.0 - 104600.0]  β2
(104600.0 - 106600.0]   0
(106600.0 - 108600.0]  β3
(108600.0 - 7.0086e6]  β10

Counts

min: 80.466 ΞΌs (0.00% GC); mean: 87.458 ΞΌs (3.05% GC); median: 82.759 ΞΌs (0.00% GC); max: 7.009 ms (97.91% GC).
``````

The other issue reporting a regression used a Skylake-X Xeon CPU, which is more or less the same as mine. Two possible causes there:

1. Their config was dual socket, mine is single socket. Maybe thereβs an issue with multi-socket systems.
2. Starting in Julia 1.6 (with LLVM >=10, 1.6 ships with LLVM 11, 1.5 with LLVM 9), LLVM will use 256-bit vectors on AVX512 systems (note: LoopVectorization will continue to use 512-bit vectors). I override this by default (starting Julia with `-C"native,-prefer-256-bit"` to turn off the preference for 256-bit). I did try running the benchmarks with the default settings. This caused around a 5% hit to performance, nothing like the one reported in that issue so I didnβt bring it up there as it couldnβt explain the reported <1.6 v 1.6 discrepency.

But neither of these are the case with Deepaβs skylake CPU (single socket, no AVX512).

FastBroadcast.jl just relies on LLVM to optimize loops, just like base broadcasting and the old `@..`, but should be easier on LLVM.

I think someone experiencing bad performance should profile, looking for anything taking a lot more time in 1.6 than in <1.6, and report with that example.

1 Like

Out of curiosity I ran the code on my 2017 MacBook Pro and found times comparable to Chrisβs.

23.593 ΞΌs (267 allocations: 40.45 KiB)
28.850 ΞΌs (278 allocations: 34.07 KiB)
28.965 ΞΌs (284 allocations: 49.77 KiB)

234.572 ΞΌs (1020 allocations: 161.07 KiB)
327.884 ΞΌs (1986 allocations: 191.86 KiB)

123.919 ΞΌs (637 allocations: 54.28 KiB)
108.163 ΞΌs (467 allocations: 33.05 KiB)
147.139 ΞΌs (669 allocations: 34.77 KiB)

130.978 ΞΌs (671 allocations: 60.78 KiB)
117.362 ΞΌs (503 allocations: 39.83 KiB)
155.347 ΞΌs (705 allocations: 41.55 KiB)

164.535 ΞΌs (596 allocations: 61.69 KiB)
102.793 ΞΌs (488 allocations: 39.06 KiB)
145.441 ΞΌs (683 allocations: 41.02 KiB)

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Coreβ’ i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, haswell)
Environment:

``````# https://scicomp.stackexchange.com/questions/37440/specifying-ode-solver-options-to-speed-up-compute-time

using DifferentialEquations, BenchmarkTools

mat1=[
1    -2     1     0     0     0     0     0     0     0;
0     1    -2     1     0     0     0     0     0     0;
0     0     1    -2     1     0     0     0     0     0;
0     0     0     1    -2     1     0     0     0     0;
0     0     0     0     1    -2     1     0     0     0;
0     0     0     0     0     1    -2     1     0     0;
0     0     0     0     0     0     1    -2     1     0;
0     0     0     0     0     0     0     1    -2     1;
];

mat2 = [
1    -1     0     0     0     0     0     0     0     0;
0     1    -1     0     0     0     0     0     0     0;
0     0     1    -1     0     0     0     0     0     0;
0     0     0     1    -1     0     0     0     0     0;
0     0     0     0     1    -1     0     0     0     0;
0     0     0     0     0     1    -1     0     0     0;
0     0     0     0     0     0     1    -1     0     0;
0     0     0     0     0     0     0     1    -1     0;
];

x0 = [1.0,0,0,0,0,0,0,0,0,0]
saveat = 0:0.01:5

function fun(dx,x,p,t)
dx[1,:] .= 0
dx[2:9,:] .= mat1*x + mat2*x
dx[10,:] .= 2*(x[end-1] - x[end])
end

prob = ODEProblem(fun,x0,(0.0,5.0))
sys = modelingtoolkitize(prob)
fastprob = ODEProblem(sys,x0,(0.0,5.0),jac=true)

# Explicit RK Methods
@btime sol = solve(fastprob,Tsit5()) # 16.700 ΞΌs (245 allocations: 40.28 KiB)
@btime sol = solve(fastprob,BS3()) # 19.800 ΞΌs (231 allocations: 33.70 KiB)
@btime sol = solve(fastprob,Vern7()) # 18.400 ΞΌs (266 allocations: 49.62 KiB)

# Stabilized-Explicit RK Methods
@btime sol = solve(fastprob,ROCK2()) # 173.300 ΞΌs (831 allocations: 159.59 KiB)
@btime sol = solve(fastprob,ROCK4()) # 237.100 ΞΌs (1958 allocations: 191.64 KiB)

# Implicit and Semi-Implicit Methods
@btime sol = solve(fastprob,Rosenbrock23()) # 83.200 ΞΌs (541 allocations: 53.50 KiB)
@btime sol = solve(fastprob,TRBDF2()) # 72.400 ΞΌs (297 allocations: 31.72 KiB)
@btime sol = solve(fastprob,KenCarp47()) # 110.500 ΞΌs (444 allocations: 33.02 KiB)

sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true)
@btime sol = solve(sparseprob,Rosenbrock23()) # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
@btime sol = solve(sparseprob,TRBDF2()) # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
@btime sol = solve(sparseprob,KenCarp47()) # 346.400 ΞΌs (1757 allocations: 525.05 KiB)

using Setfield, LinearAlgebra
f = fastprob.f
newf = @set f.jac_prototype = Tridiagonal(sparseprob.f.jac_prototype)
newf = @set newf.sparsity = Tridiagonal(sparseprob.f.sparsity)
tridiagprob = ODEProblem(newf,x0,(0.0,5.0))
@btime sol = solve(tridiagprob,Rosenbrock23()) # 188.000 ΞΌs (556 allocations: 66.19 KiB)
@btime sol = solve(tridiagprob,TRBDF2()) # 87.800 ΞΌs (338 allocations: 40.31 KiB)
@btime sol = solve(tridiagprob,KenCarp47()) # 133.400 ΞΌs (482 allocations: 42.16 KiB)

versioninfo()``````

It would be good to have a few other people run it too, see if we can find a pattern in the CPUs or something.

2 Likes

And also `Base.JLOptions()`, to make sure there arenβt any weird settings?

I tried the same code in Juno and I observe the following

``````19.200 ΞΌs (245 allocations: 40.27 KiB)
24.100 ΞΌs (231 allocations: 33.69 KiB)
22.100 ΞΌs (266 allocations: 49.61 KiB)
204.200 ΞΌs (831 allocations: 159.58 KiB)
277.700 ΞΌs (1958 allocations: 191.62 KiB)
5.259 ms (10836 allocations: 301.75 KiB)
1.501 ms (3144 allocations: 100.36 KiB)
1.985 ms (4167 allocations: 122.78 KiB)
5.275 ms (10870 allocations: 308.25 KiB)
1.515 ms (3180 allocations: 107.14 KiB)
1.998 ms (4203 allocations: 129.56 KiB)
127.900 ΞΌs (502 allocations: 60.94 KiB)
72.500 ΞΌs (318 allocations: 37.72 KiB)
107.700 ΞΌs (458 allocations: 39.25 KiB)
``````

I still observe ms for `Implicit and Semi-Implicit Methods` and while running `sparseprob` . The other runtimes are more or less close to what was posted by you.

I ran the code on my home computer set-up (16GB RAM):

``````Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i9-9900T CPU @ 2.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
``````

The results are (REPL):

``````julia> # Explicit RK Methods
julia> @btime sol = solve(fastprob,Tsit5()); # 16.700 ΞΌs (245 allocations: 40.28 KiB)
16.100 ΞΌs (245 allocations: 40.28 KiB)
julia> @btime sol = solve(fastprob,BS3()); # 19.800 ΞΌs (231 allocations: 33.70 KiB)
20.200 ΞΌs (231 allocations: 33.70 KiB)
julia> @btime sol = solve(fastprob,Vern7()); # 18.400 ΞΌs (266 allocations: 49.62 KiB)
18.100 ΞΌs (266 allocations: 49.62 KiB)

julia> # Stabilized-Explicit RK Methods
julia> @btime sol = solve(fastprob,ROCK2()); # 173.300 ΞΌs (831 allocations: 159.59 KiB)
170.200 ΞΌs (831 allocations: 159.59 KiB)
julia> @btime sol = solve(fastprob,ROCK4()); # 237.100 ΞΌs (1958 allocations: 191.64 KiB)
212.100 ΞΌs (1958 allocations: 191.64 KiB)

julia> # Implicit and Semi-Implicit Methods
julia> @btime sol = solve(fastprob,Rosenbrock23()); # 83.200 ΞΌs (541 allocations: 53.50 KiB)
82.100 ΞΌs (543 allocations: 53.55 KiB)
julia> @btime sol = solve(fastprob,TRBDF2()); # 72.400 ΞΌs (297 allocations: 31.72 KiB)
69.100 ΞΌs (297 allocations: 31.72 KiB)
julia> @btime sol = solve(fastprob,KenCarp47()); # 110.500 ΞΌs (444 allocations: 33.02 KiB)
98.600 ΞΌs (444 allocations: 33.02 KiB)

julia> # Sparse problems
julia> @btime sol = solve(sparseprob,Rosenbrock23()); # 670.000 ΞΌs (3505 allocations: 1.22 MiB)
86.900 ΞΌs (577 allocations: 60.05 KiB)
julia> @btime sol = solve(sparseprob,TRBDF2()); # 254.000 ΞΌs (1332 allocations: 414.91 KiB)
76.200 ΞΌs (333 allocations: 38.50 KiB)
julia> @btime sol = solve(sparseprob,KenCarp47()); # 346.400 ΞΌs (1757 allocations: 525.05 KiB)
100.800 ΞΌs (480 allocations: 39.80 KiB)

julia> # Tridiagonal problems
julia> @btime sol = solve(tridiagprob,Rosenbrock23()); # 188.000 ΞΌs (556 allocations: 66.19 KiB)
108.900 ΞΌs (502 allocations: 60.95 KiB)
julia> @btime sol = solve(tridiagprob,TRBDF2()); # 87.800 ΞΌs (338 allocations: 40.31 KiB)
61.900 ΞΌs (318 allocations: 37.73 KiB)
julia> @btime sol = solve(tridiagprob,KenCarp47()); # 133.400 ΞΌs (482 allocations: 42.16
88.000 ΞΌs (458 allocations: 39.27 KiB)
``````

So the issue is PyCharm? Can you share what `Base.JLOptions()` gives from PyCharm?

@ChrisRackauckas Sure, I post what Base.JLOptions() gives soon. Iβm facing some issue in opening REPL in PyCharm.

Also, in Juno Iβm not sure why Implicit and Semi-Implicit Methods take long time for my run.