Specifying ode solver options to speed up compute time

@ChrisRackauckas

using LinearAlgebra
BLAS.get_num_threads()

returns 4.

And please find below the output of Base.JLOptions() in PyCharm.

JLOptions(
  quiet = 0,
  banner = -1,
  julia_bindir = "C:\\Users\\user\\AppData\\Local\\Programs\\Julia-1.6.1\\bin",
  julia_bin = "C:\\Users\\user\\AppData\\Local\\Programs\\Julia-1.6.1\\bin\\julia.exe",
  commands = Pair{Char, String}[],
  image_file = "C:\\Users\\user\\AppData\\Local\\Programs\\Julia-1.6.1\\lib\\julia\\sys.dll",
  cpu_target = "native",
  nthreads = 0,
  nprocs = 0,
  machine_file = "",
  project = "",
  isinteractive = 0,
  color = 0,
  historyfile = 1,
  startupfile = 0,
  compile_enabled = 1,
  code_coverage = 0,
  malloc_log = 0,
  opt_level = 2,
  debug_level = 1,
  check_bounds = 0,
  depwarn = 0,
  warn_overwrite = 0,
  can_inline = 1,
  polly = 1,
  trace_compile = "",
  fast_math = 0,
  worker = 0,
  cookie = "",
  handle_signals = 1,
  use_sysimage_native_code = 1,
  use_compiled_modules = 1,
  bindto = "",
  outputbc = "",
  outputunoptbc = "",
  outputo = "",
  outputasm = "",
  outputji = "",
  output_code_coverage = "",
  incremental = 0,
  image_file_specified = 0,
  warn_scope = 1,
  image_codegen = 0,
  rr_detach = 0
)

Those JLOptions all look normal to me.

I just ran it again from IntelliJ Idea, which should be about equivalent to PyCharm. I’ve had no issues, times are the same as Juno and Visual Studio Code on MacOS BigSur, Julia 1.6.1. So unlikely to be an issue with the IDE per se. And my JLOptions are the same as Deepa’s.

I tried in IntelliJ Idea and I still get the same result as before :confused:


2.889 ms (34957 allocations: 2.63 MiB)
  3.049 ms (31647 allocations: 1.73 MiB)
  3.803 ms (46812 allocations: 5.47 MiB)
  24.898 ms (268252 allocations: 10.67 MiB)
  5.854 ms (80043 allocations: 3.01 MiB)
  88.981 ms (605234 allocations: 20.99 MiB)
  28.547 ms (217583 allocations: 7.47 MiB)
  38.240 ms (288531 allocations: 10.52 MiB)
  92.201 ms (597849 allocations: 20.82 MiB)
  26.173 ms (210161 allocations: 7.29 MiB)
  34.159 ms (281243 allocations: 10.33 MiB)
  12.609 ms (156756 allocations: 4.92 MiB)
  7.034 ms (72883 allocations: 2.52 MiB)
  10.387 ms (103225 allocations: 4.15 MiB)

Hi @ChrisRackauckas and @Elrod

I have tried running the code via Pycharm and Juno in Linux. For some reason I still don’t
observe the time of implicit solvers in microseconds.


42.446 μs (245 allocations: 40.27 KiB)
  51.050 μs (231 allocations: 33.69 KiB)
  52.102 μs (266 allocations: 49.61 KiB)
  402.665 μs (831 allocations: 159.58 KiB)
  558.716 μs (1958 allocations: 191.62 KiB)
  12.010 ms (10836 allocations: 301.75 KiB)
  3.475 ms (3144 allocations: 100.36 KiB)
  4.576 ms (4167 allocations: 122.78 KiB)
  12.077 ms (10870 allocations: 308.25 KiB)
  3.498 ms (3180 allocations: 107.14 KiB)
  4.581 ms (4203 allocations: 129.56 KiB)
  287.877 μs (502 allocations: 60.94 KiB)
  178.873 μs (318 allocations: 37.72 KiB)
  253.334 μs (458 allocations: 39.25 KiB)

I also tried via VS code and IntelliJ IDEA on Windows and the same issue exists.

It’s a fluke that I don’t understand that IDE even mattered in one case. For this, I would think RecursiveFactorization.jl heuristics may be involved. @YingboMa what would be the way to benchmark that?

The timings you post there are substantially better than from your previous (IntelliJ) post.
I have no idea why the editor would matter.

@Elrod @ChrisRackauckas

I’m summarizing all the details here.

The following result is from Atom IDE installed in Linux

42.999 μs (245 allocations: 40.27 KiB)
  49.789 μs (231 allocations: 33.69 KiB)
  54.624 μs (266 allocations: 49.61 KiB)
  396.824 μs (831 allocations: 159.58 KiB)
  552.006 μs (1958 allocations: 191.62 KiB)
  11.995 ms (10836 allocations: 301.75 KiB)
  3.532 ms (3144 allocations: 100.36 KiB)
  4.568 ms (4167 allocations: 122.78 KiB)
  12.142 ms (10870 allocations: 308.25 KiB)
  3.526 ms (3180 allocations: 107.14 KiB)
  4.610 ms (4203 allocations: 129.56 KiB)
  290.927 μs (502 allocations: 60.94 KiB)
  179.646 μs (318 allocations: 37.72 KiB)
  254.691 μs (458 allocations: 39.25 KiB)

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, westmere)
Environment:
  JULIA_NUM_THREADS = 8

Julia was downloaded from here Generic Linux on x86.
uber-juno and julia-client is installed in atom and julia path has been speified in Juno>Settings> Julia Path. /home/xxxx/julia-1.6.1/bin/julia

While installing juno , I faced the issue reported here Installing juno in atom. So I have defined
LD_LIBRARY_PATH=/opt/gcc_10_2_1/usr/lib64/ in bashrc before starting atom .

And the following is via atom on Windows


19.400 μs (245 allocations: 40.27 KiB)
  24.100 μs (231 allocations: 33.69 KiB)
  22.400 μs (266 allocations: 49.61 KiB)
  204.300 μs (831 allocations: 159.58 KiB)
  277.200 μs (1958 allocations: 191.62 KiB)
  5.309 ms (10834 allocations: 301.70 KiB)
  1.520 ms (3144 allocations: 100.36 KiB)
  2.001 ms (4167 allocations: 122.78 KiB)
  5.311 ms (10870 allocations: 308.25 KiB)
  1.526 ms (3180 allocations: 107.14 KiB)
  2.001 ms (4203 allocations: 129.56 KiB)
  129.300 μs (499 allocations: 60.88 KiB)
  73.500 μs (318 allocations: 37.72 KiB)
  109.100 μs (458 allocations: 39.25 KiB)
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\xxxx\AppData\Local\atom\app-1.57.0\atom.exe"  -a
  JULIA_NUM_THREADS = 2

That is a problem.
The E5620 is an 11 year old CPU.
Westmere does not have AVX, FMA3, or AVX2.
These instruction sets are important for getting good performance.

It being twice as slow as your Skylake makes sense/is to be expected.

1 Like

This makes sense, thank you. But somehow I still don’t understand why I am not able to
observe the time of implicit solvers in microseconds on Windows for this toy example that I have set up :cry:.

Because you’re missing all of the SIMD commands used in the factorization tools on your CPU.

2 Likes

Could you run the millisecond code in a loop for 100 times and profile it?

Thank you. I 'd also like to know if that’s the problem with both my runs: Windows and linux CPU’s?

Could you please let me know if you are looking for this?

1 Like

Hi @ChrisRackauckas,
I tried to set up my actual system in Julia.
In MATLAB I could solve the system of ~9000 stiff odes using ode15s solver in 260 seconds.

In Julia, I tried to set up the same system. The following solver was used.

sparseprob = ODEProblem(sys,x0,(0.0,5.0),jac=true,sparse=true)
@btime sol = solve(sparseprob,Rosenbrock23()) # 670.000 μs (3505 allocations: 1.22 MiB)

it ran for more than 2 hours and I had to kill the job. For my actual system the mat1 and mat2 (in the toy example presented here ) generated from MATLAB, have been loaded from mat file in Julia. I’d actually like to share my mat files along with the Julia code (15 lines of code). Unfortunately, I cannot post the complete system on this forum since it is my thesis work which is not published yet. May I share it elsewhere? via email or in a drive?

Can any of the developers please have a look and suggest the right solver and solver settings (jacobian, sparsity pattern, …) that I should define as inputs to achieve better performance than MATLAB?

Thanks a lot for the wonderful support so far.

Where is the time actually taking place? In the problem generation? I would assume a huge analytical Jacobian isn’t a great idea, so not building the Jacobian may be required at that size.

sparseprob = ODEProblem(sys,x0,(0.0,5.0),sparse=true)

Also, Rosenbrock23() doesn’t make sense for an ODE of that size. You might want to follow the recommendations. I would try:

@btime sol = solve(sparseprob,Tsit5()) # if it ends up non-stiff 
@btime sol = solve(sparseprob,ROCK2()) # if it ends up semi-stiff
@btime sol = solve(sparseprob,TRBDF2()) # if it has complex eigenvalues
@btime sol = solve(sparseprob,QNDF()) # otherwise
2 Likes

I meant using a statistical profiler described in Profiling · The Julia Language

@YingboMa Thanks for the clarification. Excuse me for the naive question, could you please explain how these are different?

@benchmark only shows you total time (with BenchmarkHistograms as a histogram of individual running times), whereas @profile will show where in the code the time is spent.

3 Likes

Thanks a lot for sharing the really useful documentation
I tried with QNDF() which I understand is the translation of ode15s method that I’ve used in MATLAB

@btime sol = solve(sparseprob,QNDF())

I get a segmentation fault. The complete log is shared here.

Could you please have a look?

Thanks for the suggestion. I observed this in MATLAB as well. Trying to specify the jacobian took a long time. So, in addition to the spare representation, I specified the jacobian pattern in odeset option for solving it in ~200 seconds.