I am solving a system of linear differential equations (of type [dx/dt]= [Jacobian]*[ x ] ) with about 1 million simultaneous differential equations. The system is highly stiff and the jacobian is sparse and banded. I use CVODE_BDF(linear_solver=:GMRES), the accuracy is good, but for this large system it takes over 5 days to arrive at the solution. I was hoping if I could accelerate the solution by making use of multiple cores in my HPC. I enabled multi-threading and converted my jacobian to ThreadedSparseArrays (See MWE below) . Does the linear_solver=:GMRES in CVODE_BDF automatically make use of avaialable multithreading ? If not how do I define explicitly for the CVODE solver to use the multiple available threads?. Please find MWE below. (Also my HPC doesnâ€™t have GPUs so I wonâ€™t be able to explore CUDA).

using MAT, DataFrames,ThreadedSparseArrays, DifferentialEquations
using LinearAlgebra, SparseArrays, LinearSolve, Sundials
# enable multi-threading
BLAS.set_num_threads(Threads.nthreads()) ;
Jacobian = matread("Jacobian.mat")["Jacobian "]; # load the large jacobian
# Convert Jacobian to a ThreadedSparseMatrix for parallel operations
Jacobian_threaded = ThreadedSparseMatrixCSC(Jacobian)
# Define the Jacobian function
function jacobianfun(J, u, p, t)
J .= Jacobian_threaded
nothing
end
Jpattern = Jacobianmat .!= 0; # compute jacobian pattern
f(du, u, p, t) = mul!(du, Jacobian , u) ; # define the function : du = jacobianxu
f_ode = ODEFunction(f, jac = jacobianfun,jac_prototype = float.(Jpattern))
u0 = zeros(1e6,1) ;
u0[1:10000] .=1; % set initial condition for solver
prob = ODEProblem(f, u0, (0.0, 1e-3 )); # define ODE problem
#Solve the problem using CVODE_BDF from sundials package
@time sol = solve(prob,CVODE_BDF(linear_solver= :GMRES, save_everystep = false,dense= false,reltol = 1e-7,abstol= 1e-8, maxiters=Int64.(1e8));

They are horribly slow in my case (and sometimes are highly inaccurate over a large time period- see the note in the next paragraph). Ive also tried various other stiff solvers previously (such as Rodas, Kencarp, Rosenbrock etc.), but found CVODE_BDF to be by far the most stable, accurate and fast solver of all. Since CVODE works well for my case, I would like to employ parallel processing (in my HPC) to these solvers (preferably). Also my HPC doesnâ€™t have GPUs so at the moment I might not be able to explore the same.

A quick note of the system: The max values in the jacobian are of the order 1e12, the minimum nonzero values are 1e3 and i want to solve the problem till a span of 1e-3 seconds (a large time-span considering the fastest dynamics (1e12 Hz)are 9 orders faster than the slowest dynamics).

Also @ChrisRackauckas, as you recommended I will try using solvers such as LinearExponential(), and other State-Independent Solvers and let you know the results soon.

Sure. I have created a minitature version of my large problem and uploaded the files (code, data and very basic results in a word file) in this drive link: STIFF_ODE_PROB - Google Drive .
Please take a look. Also, kindly guide me on means to accelerate my code. Thanks!!

If you use iterative solvers for the exponential, you should try to monitor the convergence and adjust tolerances, etc. The defaults rarely work in practice. Then consider using a pre-conditionner.

@rveltz Yes, I do have globals all over the code. Is that a bad coding practice? I donâ€™t know. I already played with the tolerances (I set tolerances explicitly and didnâ€™t let it assume the default), but still exponential ones I found to be either highly time-consuming or inaccurate. I will try using a pre-conditioner as you suggested and let you know. But is there anything I can do to speed up the results of CVODE_BDF using parallel computing?

Check out the Julia performance tips. Generally globals, especially untyped globals, should be avoided, and one should instead pass via parameter structures: