You just turn `sparse=true`

on in the generation call.

With `CVODE_BDF`

finally working, using that instead of `Rodas5`

, the runtime of my main program for `Npix = 1601`

was reduced from 8 hours to 2 hours! Wow!

One last question, if I mayâŚ since I am running on a system with multiple cores, is there a straightforward way to make the command `DifferentialEquations.solve`

(from my sample program above) run in a distributed way on the multiple cores, for savings of time and memory? (BTW, I run the program from the IDE, not from command line.)

(I tried reading through âMulti-processing and Distributed Computing Âˇ The Julia Languageâ, but Iâm afraid that it did not illuminate me.)

Thanks againâŚ!

Concerning your code, you can accelerate that inner loop with (start julia with `julia -t N`

where `N`

is the number of cores you want to use):

```
Threads.@threads for i in 2:(Npts-1)
du[i,2] = u[i,1]
du[i,1] = (-k*u[i,2]) + ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2))
end
```

If that makes a difference for overall peformance, you might want to rewrite the loop to actually take advantage of multi-threading, but I would test that first.

you can also (with a single core), use:

`using LoopVectorization`

and change that loop to:

```
@avx for i in 2:(Npts-1)
du[i,2] = u[i,1]
du[i,1] = (-k*u[i,2]) + ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2))
end
```

that provides some additional speedup.

But those things will only make any difference if the computation of that loop is important for the overall performance, which I am not sure (I could not run your example).

The bottleneck is very likely the computation of the jacobian. Also, you can pass options to CVODE_BDF like dense, banded if I recall

The simplest thing to try is `CVODE_BDF(linear_solver=:GMRES)`

which would be matrix-free. Without a preconditioner you might get a speedup you might not, but itâs at least easy to check.

Hi again folks, I have implemented some of these changes, and the results are of the âimpossibly too good to be trueâ variety.

For my real (very large) program, solving the coupled ODE system originally took over 3 daysâŚ using `Rodas5()`

, the whole program with everything took over 82 hours!

Using `CVODE_BDF()`

the program took only 2.5 hours (148 minutes).

Then, using `CVODE_BDF(linear_solver=:GMRES)`

took less than 1 hour (51 minutes), and most of that time wasnât even doing ODE solving!

I was enormously skeptical about this impossible amount of speedup, so I compared the ODE solutionsâŚ and the biggest difference in any value between the different solvers was less than 3E-10.

The solver doesnât even seem to scale with the number of pixels anymore! Running my simpler program shown above with `CVODE_BDF(linear_solver=:GMRES)`

â noting that it must include the corrected line, `"maxiters = Int(1e7)"`

, to run successfully â going from `Npix = 801`

to `1601`

to `3201`

only increased the `DifferentialEquations.solve`

command runtime from 4.87 to 5.87 to 6.52 seconds!

Iâm not sure how any of this is possible, but Iâm glad it is! Thanks for all of your suggestionsâŚ Iâm still implementing more of them as necessaryâŚ