You just turn sparse=true
on in the generation call.
With CVODE_BDF
finally working, using that instead of Rodas5
, the runtime of my main program for Npix = 1601
was reduced from 8 hours to 2 hours! Wow!
One last question, if I may⌠since I am running on a system with multiple cores, is there a straightforward way to make the command DifferentialEquations.solve
(from my sample program above) run in a distributed way on the multiple cores, for savings of time and memory? (BTW, I run the program from the IDE, not from command line.)
(I tried reading through âMulti-processing and Distributed Computing ¡ The Julia Languageâ, but Iâm afraid that it did not illuminate me.)
Thanks againâŚ!
Concerning your code, you can accelerate that inner loop with (start julia with julia -t N
where N
is the number of cores you want to use):
Threads.@threads for i in 2:(Npts-1)
du[i,2] = u[i,1]
du[i,1] = (-k*u[i,2]) + ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2))
end
If that makes a difference for overall peformance, you might want to rewrite the loop to actually take advantage of multi-threading, but I would test that first.
you can also (with a single core), use:
using LoopVectorization
and change that loop to:
@avx for i in 2:(Npts-1)
du[i,2] = u[i,1]
du[i,1] = (-k*u[i,2]) + ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2))
end
that provides some additional speedup.
But those things will only make any difference if the computation of that loop is important for the overall performance, which I am not sure (I could not run your example).
The bottleneck is very likely the computation of the jacobian. Also, you can pass options to CVODE_BDF like dense, banded if I recall
The simplest thing to try is CVODE_BDF(linear_solver=:GMRES)
which would be matrix-free. Without a preconditioner you might get a speedup you might not, but itâs at least easy to check.
Hi again folks, I have implemented some of these changes, and the results are of the âimpossibly too good to be trueâ variety.
For my real (very large) program, solving the coupled ODE system originally took over 3 days⌠using Rodas5()
, the whole program with everything took over 82 hours!
Using CVODE_BDF()
the program took only 2.5 hours (148 minutes).
Then, using CVODE_BDF(linear_solver=:GMRES)
took less than 1 hour (51 minutes), and most of that time wasnât even doing ODE solving!
I was enormously skeptical about this impossible amount of speedup, so I compared the ODE solutions⌠and the biggest difference in any value between the different solvers was less than 3E-10.
The solver doesnât even seem to scale with the number of pixels anymore! Running my simpler program shown above with CVODE_BDF(linear_solver=:GMRES)
â noting that it must include the corrected line, "maxiters = Int(1e7)"
, to run successfully â going from Npix = 801
to 1601
to 3201
only increased the DifferentialEquations.solve
command runtime from 4.87 to 5.87 to 6.52 seconds!
Iâm not sure how any of this is possible, but Iâm glad it is! Thanks for all of your suggestions⌠Iâm still implementing more of them as necessaryâŚ