Would it be possible to use multiple CPU cores in Solving Nonlinear Equations?

I am trying to solve a system of nonlinear equations (~25 variables). Currently, I am using NonlinearSolve.jl, it works but I want to expedite the computation. Except for some strategies introduced here such as using StaticArray and specifying the sparse Jacob matrix, I am wondering if it’s feasible to leverage multiple CPU cores to speed up the computation. Currently, the solving process appears to utilize only a single CPU core.

It will automatically multithread on large equations, but indeed your equation is small.

You can try AutoPolyseterForwardDiff and see if you get a speedup, but that’s dependent on whether your time is generally spent in the Jacobian construction.

Small nonlinear systems are generally not a problem with good parallelization in the algorithms, though it could be a nice research topic.


Thanks for your kind suggestion. My equations cannot be automatically differentiated, and thus AutoPolyseterForwardDiff is not a viable choice for me (I am using autodiff=AutoSparseFiniteDiff()). I asked this question because in MATLAB, there is a UseParallel=true option for fsolve to estimate gradients in parallel. But as you said, “NonlinearSolve.jl will automatically multithread on large equations”.

As a beginner in Julia, I think the slow computation speed is possibly because I don’t write the equations in an efficient way. I’ll look into the performance tips and see if I can improve my code.

For a 25 variable problem, sparsity is most like going to slow down your problem rather that speed it up. Sparsity benefits kick in after around 1000 variables at the minimum Ill-Conditioned Nonlinear System Work-Precision Diagrams · The SciML Benchmarks


Thank you for your answer. That’s a really informative benchmark. In my 25-variable problem, the jacobian is tridiagonal, so the non-zeros of the matrix is (25+24*2)/(25*25)=11.68%. I compared the solving times for autodiff=AutoFiniteDiff() and autodiff=AutoSparseFiniteDiff(), and found that sparsity does speed up the calculation. However, my objective function seems to suffer from type instability (and maybe many unnecessary allocations), which may contaminate the comparison results. I’ll try to fix my equation constructions.

1 Like

Ah for Tridiagonal, can you supply a prototype directly as jac_prototype = <...> where the prototype is of type Tridiagonal but the entries can be anything?

AutoSparse... will generate a sparse matrix, but in case of tridiagonal you can use better factorization and such if the prototype is given.

Thanks. By specifing jac_prototype = a Tridiagonal and using autodiff=AutoSparseFiniteDiff(), an approximately 30% speedup is achieved for my 25-variable problem.