You just turn
sparse=true on in the generation call.
You just turn
CVODE_BDF finally working, using that instead of
Rodas5, the runtime of my main program for
Npix = 1601 was reduced from 8 hours to 2 hours! Wow!
One last question, if I may… since I am running on a system with multiple cores, is there a straightforward way to make the command
DifferentialEquations.solve (from my sample program above) run in a distributed way on the multiple cores, for savings of time and memory? (BTW, I run the program from the IDE, not from command line.)
(I tried reading through “Multi-processing and Distributed Computing · The Julia Language”, but I’m afraid that it did not illuminate me.)
Concerning your code, you can accelerate that inner loop with (start julia with
julia -t N where
N is the number of cores you want to use):
Threads.@threads for i in 2:(Npts-1) du[i,2] = u[i,1] du[i,1] = (-k*u[i,2]) + ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2)) end
If that makes a difference for overall peformance, you might want to rewrite the loop to actually take advantage of multi-threading, but I would test that first.
you can also (with a single core), use:
and change that loop to:
@avx for i in 2:(Npts-1) du[i,2] = u[i,1] du[i,1] = (-k*u[i,2]) + ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2)) end
that provides some additional speedup.
But those things will only make any difference if the computation of that loop is important for the overall performance, which I am not sure (I could not run your example).
The bottleneck is very likely the computation of the jacobian. Also, you can pass options to CVODE_BDF like dense, banded if I recall
The simplest thing to try is
CVODE_BDF(linear_solver=:GMRES) which would be matrix-free. Without a preconditioner you might get a speedup you might not, but it’s at least easy to check.
Hi again folks, I have implemented some of these changes, and the results are of the “impossibly too good to be true” variety.
For my real (very large) program, solving the coupled ODE system originally took over 3 days… using
Rodas5(), the whole program with everything took over 82 hours!
CVODE_BDF() the program took only 2.5 hours (148 minutes).
CVODE_BDF(linear_solver=:GMRES) took less than 1 hour (51 minutes), and most of that time wasn’t even doing ODE solving!
I was enormously skeptical about this impossible amount of speedup, so I compared the ODE solutions… and the biggest difference in any value between the different solvers was less than 3E-10.
The solver doesn’t even seem to scale with the number of pixels anymore! Running my simpler program shown above with
CVODE_BDF(linear_solver=:GMRES) – noting that it must include the corrected line,
"maxiters = Int(1e7)", to run successfully – going from
Npix = 801 to
3201 only increased the
DifferentialEquations.solve command runtime from 4.87 to 5.87 to 6.52 seconds!
I’m not sure how any of this is possible, but I’m glad it is! Thanks for all of your suggestions… I’m still implementing more of them as necessary…