Simplest way to convert a program for parallel (multithreaded) runs on multiple servers/cores?

Hi again all, I am running a collection of programs doing intensive numerical calculations (e.g., ~1-2 days of runtime on a system with 64 GB of memory and CPU > 2.1 GHz).

I have an opportunity to save tremendous resources (and money) by adapting my programs to run on a locally available system with multiple servers & cores. But completely re-writing my pre-existing programs (over 5,000 lines of code) to adapt them to this parallel system would be a prohibitive amount of work, and seems impossible to do without introducing numerous errors.

(Also, I don’t know how to do it; something called “multi-threading” seems to be involved, and my only pointer so far is a link to Distributing Computing for Julia (https://docs.julialang.org/en/v1/stdlib/Distributed), which has so much information I don’t know where to start.)

I can’t be the first person with this problem… is there some straightforward way to modify a program minimally to get it to run on a parallel system like this? (And if so, could someone share some simple coding examples on how to do it?)

Thanks for any info…!

Hard to tell without any information about your application. But the simplest parallelization model is this: Multi-Threading · The Julia Language

But the first homework is to find out what exactly in your program takes most of the time, to think how that must be parallelized (or accelerated sequentially as well, many times a code can be accelerated hundreds of times before thinking about parallelization).

2 Likes

Hi Leandro, thanks for the reply.

The Multi-Threading link looks interesting, but appears like I might have to go through all of the programs and modify every single loop structure. (!?)

Anyway, the vast majority of my program run-time is spent doing one type of task: solving a PDE using the method-of-lines. In other words, solving a very large (> 1000) set of coupled ODE’s. This, for example, is a set of commands that could take over 24 hours:

prob = ODEProblem(CoupledEqns!, InitConditionsArray, tspan, ParamsTuple)

@time SolvTheProb = DifferentialEquations.solve(prob, Rodas5(), dense=true, 
                    reltol=1e-12, abstol=1e-12, maxiters = 1e7, progress=true)

For example, InitConditionsArray may be an array of 3201 elements, leading to a “sparsely coupled” system of differential equations, where each Nth equation depends also upon the values before it (N-1) and the one after it (N+1).

Paralellizing coupled ODEs isn’t the easiest thing. But if you provide a minimal working example I am sure you will get good advice here.

(Without knowing anything about the problem, 1e-12 seems a too strict tolerance, given the precision of floating point arithmetics.).

2 Likes

An ODE of size 1000 is not big. You said that the jacobian is sparse, did you try giving the jacobian sparsity pattern, or provide an analytical jacobian?

Have your read the ODE tips ?

Hi Leandro, a copy of a simplified version is below. As written, the program takes ~3-4 minutes to run with Npix = 201. But the runtime (and memory used) goes up at least as fast as Npix^2, and the program takes days to run with Npix = 3201 (for example). Almost all of the runtime is taken by the final command below, DifferentialEquations.solve:

# Minimal working example to test Multi-threading...
#
using DifferentialEquations
# using OrdinaryDiffEq, DiffEqBase, DiffEqOperators, BandedMatrices
#
Npix = 201
tspan = [1.0,5.0]
ODEparams = tuple(0.04,1.1)
xArr = [(-4.0 + (0.04*(i-1))) for i in 1:Npix]
#
InitCondsArr = Array{Float64}(undef, Npix, 2)
InitCondsArr[1:Npix,2] .= sin.(xArr)
InitCondsArr[1:Npix,1] .= cos.(xArr)
#
function EqnsToSolve!(du,u,p,t)
    Npts = size(u)[1]
    DX = p[1]
    k = p[2]
# Deal with left-most x-value:
    du[1,2] = u[1,1]
    du[1,1] = (-k*u[1,2]) +
              (((-1*u[1,2])+(1*u[(1+1),2]))/(DX^2))
# Deal with right-most x-value:
    du[Npts,2] = u[Npts,1]
    du[Npts,1] = ((-k*u[Npts,2])/2) +
                    (-1)*((((-3/2)*u[Npts,1])+((2)*u[(Npts-1),1]) +
                    ((-1/2)*u[(Npts-2),1]))/(-DX))
# All other values in the x-grid:
    for i in 2:(Npts-1)
        du[i,2] = u[i,1]
        du[i,1] = (-k*u[i,2]) +
                    ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2))
    end
end
#
ProbToSolve = ODEProblem(EqnsToSolve!, InitCondsArr, tspan, ODEparams)
#
@time SolveTheProb = DifferentialEquations.solve(ProbToSolve, Rodas5(),
    dense=true, reltol=1e-12, abstol=1e-12, maxiters = 1e7, progress=true)

rveltz: I looked through the page you linked me to, but it is extremely complicated. I read about the Jacobian procedures, but my program explicitly specifies how the equations for each value of x are linked together (instead of doing some kind of banded matrices approach), so I’m not sure how those procedures apply here. (Also, those examples had constant coefficients for the differential equations here – like my constants k and DX. But in general, I may have non-constant coefficients that depend on x and t.)

I’m hoping to learn how to “multi-thread” this for a computer system with multiple cores, threads, servers. (Or otherwise rapidly speed up the “solve” step.)
Any detailed suggestions welcome!

First, you are basically solving a linear Laplace equation.

You could look at ModelingToolkit.jl and https://github.com/SciML/ModelingToolkit.jl/blob/master/test/pde.jl

That’s the point… Perhaps you read too fast Automatic-Sparsity-Detection

I may have non-constant coefficients that depend on x and t .)

How would this prevent from computing a differential? :wink:


You can try CVODE_BDF as a solver, it should quite fast. It will estimate the jacobian for you.

2 Likes

rveltz, I took a closer look at your Automatic-Sparsity-Detection link, and attempting to use “Automatic Derivation of Jacobian Functions”, I got an expression too large error for any Npix greater than around 280.

For example, I modified my sample of working code above, by adding the following:

ProbToSolve = ODEProblem(EqnsToSolve!, InitCondsArr, tspan, ODEparams)
AutoJac = modelingtoolkitize(ProbToSolve)
jac_Auto = eval(ModelingToolkit.generate_jacobian(AutoJac)[2])
#
#      Above line gives the error: "syntax: expression too large"
#
f_Auto = ODEFunction(EqnsToSolve!, jac=jac_Auto)
prob_jac_Auto = ODEProblem(f_Auto, InitCondsArr, tspan, ODEparams)
@time sol_jac_Auto = solve(prob_jac_Auto)

I’m hoping to use values of Npix up to at least 3200, so it looks like this doesn’t work!

Also, I am not able to the CVODE_BDF solver, since by using the commands:

using Sundials
ProbToSolve = ODEProblem(EqnsToSolve!, InitCondsArr, tspan, ODEparams)
@time SolveTheProb = DifferentialEquations.solve(ProbToSolve, CVODE_BDF(),
    dense=true, reltol=1e-12, abstol=1e-12, maxiters = 1e7, progress=true)

I get the error:

MethodError: no method matching Sundials.DEOptions(::DataStructures.BinaryHeap{
Float64,Base.Order.ForwardOrdering},::DataStructures.BinaryHeap{Float64,
Base.Order.ForwardOrdering}, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, 
::CallbackSet{Tuple{},Tuple{}}, ::Float64, ::Float64, ::Bool, ::Bool, ::Bool, 
::Bool, ::String, ::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), ::Float64)

So I’m 0 for 2 so far.

Strange for CVODE_BDF. Maybe @ChrisRackauckas has an idea

Are you not using Float64s? It’s a C++ program so it’s a bit more restrictive.

I think I’m using Float64… A copy of the whole test program is below. It fails both on JuliaPro 1.4.2-1, and JuliaPro 1.5.3-1. On two different computer systems. Easing the tolerances to reltol=1e-6, abstol=1e-6 did not help. (BTW I have never been able to successfully use CVODE_BDF or lsoda.)

using DifferentialEquations
using OrdinaryDiffEq, DiffEqBase, DiffEqOperators, BandedMatrices
using ModelingToolkit
using Sundials # for CVODE_BDF as a solver!
#
Npix = 81
tspan = [1.0,5.0]
ODEparams = tuple(0.04,1.1)
xArr = [(-4.0 + (0.04*(i-1))) for i in 1:Npix]
#
InitCondsArr = Array{Float64}(undef, Npix, 2)
InitCondsArr[1:Npix,2] .= sin.(xArr)
InitCondsArr[1:Npix,1] .= cos.(xArr)
#
function EqnsToSolve!(du,u,p,t)
    Npts = size(u)[1]
    DX = p[1]
    k = p[2]
# Deal with left-most x-value:
    du[1,2] = u[1,1]
    du[1,1] = (-k*u[1,2]) +
              (((-1*u[1,2])+(1*u[(1+1),2]))/(DX^2))
# Deal with right-most x-value:
    du[Npts,2] = u[Npts,1]
    du[Npts,1] = ((-k*u[Npts,2])/2) +
                    (-1)*((((-3/2)*u[Npts,1])+((2)*u[(Npts-1),1]) +
                    ((-1/2)*u[(Npts-2),1]))/(-DX))
# All other values in the x-grid:
    for i in 2:(Npts-1)
        du[i,2] = u[i,1]
        du[i,1] = (-k*u[i,2]) +
                    ((u[(i+1),2]-(2*u[i,2])+u[(i-1),2])/(DX^2))
    end
end
#
ProbToSolve = ODEProblem(EqnsToSolve!, InitCondsArr, tspan, ODEparams)
#
@time SolveTheProb = DifferentialEquations.solve(ProbToSolve, CVODE_BDF(),
    dense=true, reltol=1e-12, abstol=1e-12, maxiters = 1e7, progress=true)
#

Thanks for any suggestions…

Try not JuliaPro?

1 Like

(?)
I thought I was using JuliaPro.

I have no idea what the problem is, but I confirm that the code above returns an error in 1.5.3 (no JuliaPro, no IDE):

julia> include("./cosmoprof.jl")
ERROR: MethodError: no method matching Sundials.DEOptions(::DataStructures.BinaryHeap{Float64,DataStructures.LessThan}, ::DataStructures.BinaryHeap{Float64,DataStructures.LessThan}, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::CallbackSet{Tuple{},Tuple{}}, ::Float64, ::Float64, ::Bool, ::Bool, ::Bool, ::Bool, ::String, ::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), ::Float64)
Closest candidates are:
  Sundials.DEOptions(::SType, ::TstopType, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::CType, ::abstolType, ::reltolType, ::Bool, ::Bool, ::Bool, ::Bool, ::String, ::F5, ::Int64) where {SType, TstopType, CType, reltolType, abstolType, F5} at /home/leandro/.julia/packages/Sundials/2KP0s/src/common_interface/integrator_types.jl:2
__init(::ODEProblem{Array{Float64,2},Tuple{Float64,Float64},true,Tuple{Float64,Float64},ODEFunction{true,typeof(EqnsToSolve!),LinearAlgebra.UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem}, ::CVODE_BDF{:Newton,:Dense,Nothing,Nothing}, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}; verbose::Bool, callback::Nothing, abstol::Float64, reltol::Float64, saveat::Array{Float64,1}, tstops::Array{Float64,1}, maxiters::Float64, dt::Nothing, dtmin::Float64, dtmax::Float64, timeseries_errors::Bool, dense_errors::Bool, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Bool, dense::Bool, progress::Bool, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), save_timeseries::Nothing, advance_to_tstop::Bool, stop_at_next_tstop::Bool, userdata::Nothing, alias_u0::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/leandro/.julia/packages/Sundials/2KP0s/src/common_interface/solve.jl:309
(::DiffEqBase.var"#__init##kw")(::NamedTuple{(:dense, :reltol, :abstol, :maxiters, :progress),Tuple{Bool,Float64,Float64,Float64,Bool}}, ::typeof(DiffEqBase.__init), ::ODEProblem{Array{Float64,2},Tuple{Float64,Float64},true,Tuple{Float64,Float64},ODEFunction{true,typeof(EqnsToSolve!),LinearAlgebra.UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem}, ::CVODE_BDF{:Newton,:Dense,Nothing,Nothing}, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}) at /home/leandro/.julia/packages/Sundials/2KP0s/src/common_interface/solve.jl:42

The error says maxiters isn’t an Int. Did you try Int(1e7)?

2 Likes

Just tried maxiters = Int(1e7), it worked! (For now.) I’ll test running the real (very long) program with the CVODE_BDF solver now. Thanks!

BTW, does using CVODE_BDF mean that I don’t have to bother with implementing Automatic Derivation of Jacobian Functions – as I described above, where I got an error: "syntax: expression too large" for any Npix > 280? Or would somehow fixing that provide even more speedup & memory savings?

I’m referring to the post above where I tried to implement:

ProbToSolve = ODEProblem(EqnsToSolve!, InitCondsArr, tspan, ODEparams)
AutoJac = modelingtoolkitize(ProbToSolve)
jac_Auto = eval(ModelingToolkit.generate_jacobian(AutoJac)[2])

Julia cannot compile huge functions. You’ll want to make it sparse in that case.

Ok, so that’s a limitation we can’t get around…

Regarding sparse Jacobians, I read through the Automatic Sparsity Detection section of “Solving Large Stiff Equations · DifferentialEquations.jl” as much as I could, but I got lost when it started talking about brusselators, and getting arrays of random numbers (for some reason) for arrays called input and output. Any tips on how to apply Sparsity to my example program here, I’d be very grateful.