Solving ODE on GPU from Python with DifferentialEquations.jl

Hi everyone,

I am new in Julia. My goal is to solve stiff ODE systems (N>1000) representing chemical reaction networks up to steady-state conditions. Since the Python Scipy implementation is too slow for the purpose, I just started learning Julia as DifferentialEquations.jl allows GPU acceleration. I wrote the following piece of code where I call Julia from a Python method:

def integrate_jl(self, y0):
        from julia.api import Julia
        jl = Julia()
        from julia import Main
        Main.y0 = y0  # (Ny,1)  vector of Float64
        Main.v = self.v_dense  # (Ny,NR)  Matrix of Int8
        Main.vf = self.v_forward_dense  # (NR,Ny)  Matrix of Int8
        Main.vb = self.v_backward_dense  # (NR,Ny)  matrix of Int8
        Main.kd, Main.kr = self.kd, self.kr  # (NR,1)  vector of Float64
        Main.gas_mask = self.gas_mask  # (Ny, 1)  vector of Bool
        Main.eval("""
        using CUDA
        using DifferentialEquations
        CUDA.allowscalar(true)
        y0 = CuArray{Float64}(y0)
        v = CuArray{Int8}(v)
        kd = CuArray{Float64}(kd)
        kr = CuArray{Float64}(kr)
        vf = CuArray{Int8}(vf)
        vb = CuArray{Int8}(vb)
        gas_mask = CuArray{Bool}(gas_mask)
        p = (v = v, kd = kd, kr = kr, gas_mask = gas_mask, vf =  vf, vb = vb)
        function ode_pfr!(du, u, p, t)
            rate = p.kd .* vec(prod((u .^ p.vf')', dims=2)) .- p.kr .* vec(prod((u .^ p.vb')', dims=2))
            du = vec(p.v * net_rate)
            du[p.gas_mask] .= 0.0
        end
        prob = SteadyStateProblem(ode_pfr!, y0, p)
        sol = solve(prob, DynamicSS(QNDF()))
        CUDA.allowscalar(false)
        """)
        ... 

However, I keep getting the following 'ArgumentError: cannot take the CPU address of a CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}'.

I managed to make it work on the CPU, but on the GPU I am struggling a lot.

2 Likes

What line is throwing the error? What is the stack trace?

I can see that first du needs to be du .=, but they won’t solve your error

2 Likes

The line throwing the error is

sol = solve(prob, DynamicSS(QNDF()))

The stack trace I get from PyJulia is (after splitting the Julia code in multiple Main.eval calls):

File ".../my_module.py", line 275, in integrate_jl
    Main.eval("""
  File "/home/smorandi/miniconda3/envs/my_env/lib/python3.11/site-packages/julia/core.py", line 633, in eval
    ans = self._call(src)
          ^^^^^^^^^^^^^^^
  File "/home/smorandi/miniconda3/envs/my_env/lib/python3.11/site-packages/julia/core.py", line 561, in _call
    self.check_exception(src)
  File "/home/smorandi/miniconda3/envs/my_env/lib/python3.11/site-packages/julia/core.py", line 615, in check_exception
    raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'ArgumentError: cannot take the CPU address of a CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}' occurred while calling julia code:

        sol = solve(prob, DynamicSS(QNDF()))

Thanks Chris for pointing out the missing dot in the du definition, as you said this does not solve the problem.

I just changed the stiff solver from QNDF to KenCarp4 and now works! The error source seems to be the solver algorithm.

2 Likes

I am also getting the same error when I try to solve a linear ODE
(of type [dx/dt] = [Jacobian].[ x ]) with a huge number of simultaneous equations ~1million equations. My problem is highly stiff, so I use CVODE_BDF from sundials (the jacobian is also sparse in my problem).

Now to accelerate this process I opted for GPU. Here is a minimum working code for the same. ( This is the same as example from DiffEqGPU package: Getting Started with GPU-Accelerated Differential Equations in Julia · DiffEqGPU.jl). But I want to use CVODE_BDF or QNDFwith CUDA (because my problem is huge and stiff also these solvers work best for my problem) . @SantiagoMorandi ,@ChrisRackauckas , Is there any way we could leverage GPU acceleration for solvers such as CVODE_BDF or QNDF? Kindly guide.

using DifferentialEquations, CUDA, LinearAlgebra
using Sundials, Krylov, LinearSolve
u0 = cu(rand(1000))
A = cu(randn(1000, 1000))
f(du, u, p, t) = mul!(du, A, u)
prob = ODEProblem(f, u0, (0.0f0, 1.0f0)) # Float32 is better on GPUs!
sol =solve(prob, Tsit5()); # this works
sol = solve(prob, QNDF(linsolve= KrylovJL_GMRES())) # doesnt work
sol = solve(prob, CVODE_BDF((linear_solver= :GMRES)) # doesnt work

Error- Stacktrace

ERROR: MethodError: no method matching Sundials.NVector(::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})

Closest candidates are:
  Sundials.NVector(::Vector{Float64})
   @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\nvector_wrapper.jl:16
  Sundials.NVector(::Ptr{Sundials._generic_N_Vector})
   @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\nvector_wrapper.jl:24

Stacktrace:
  [1] __init(prob::ODEProblem{…}, alg::CVODE_BDF{…}, timeseries::Vector{…}, ts::Vector{…}, ks::Vector{…}; verbose::Bool, callback::Nothing, abstol::Float64, reltol::Float64, saveat::Vector{…}, d_discontinuities::Vector{…}, tstops::Vector{…}, maxiters::Int64, dt::Nothing, dtmin::Float64, dtmax::Float64, timeseries_errors::Bool, dense_errors::Bool, save_everystep::Bool, save_idxs::Nothing, save_on::Bool, save_start::Bool, save_end::Bool, dense::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), progress_id::Symbol, save_timeseries::Nothing, advance_to_tstop::Bool, stop_at_next_tstop::Bool, userdata::Nothing, alias_u0::Bool, kwargs::@Kwargs{})       
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:200
  [2] __init(prob::ODEProblem{…}, alg::CVODE_BDF{…}, timeseries::Vector{…}, ts::Vector{…}, ks::Vector{…})
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:78
  [3] __solve(prob::ODEProblem{…}, alg::CVODE_BDF{…}, timeseries::Vector{…}, ts::Vector{…}, ks::Vector{…}, recompile::Type{…}; calculate_error::Bool, kwargs::@Kwargs{})
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:15
  [4] __solve(prob::ODEProblem{…}, alg::CVODE_BDF{…}, timeseries::Vector{…}, ts::Vector{…}, ks::Vector{…}, recompile::Type{…})
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:3
  [5] __solve(prob::ODEProblem{…}, alg::CVODE_BDF{…}, timeseries::Vector{…}, ts::Vector{…}, ks::Vector{…})
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:3
  [6] __solve(prob::ODEProblem{…}, alg::CVODE_BDF{…}, timeseries::Vector{…}, ts::Vector{…})
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:3
  [7] __solve(prob::ODEProblem{…}, alg::CVODE_BDF{…}, timeseries::Vector{…})
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:3
  [8] __solve(prob::ODEProblem{…}, alg::CVODE_BDF{…})
    @ Sundials C:\Users\Balaji\.julia\packages\Sundials\vGumE\src\common_interface\solve.jl:3
  [9] solve_call(_prob::ODEProblem{…}, args::CVODE_BDF{…}; merge_callbacks::Bool, kwargshandle::KeywordArgError, kwargs::@Kwargs{})
    @ DiffEqBase C:\Users\Balaji\.julia\packages\DiffEqBase\s433k\src\solve.jl:559
 [10] solve_call(_prob::ODEProblem{…}, args::CVODE_BDF{…})
    @ DiffEqBase C:\Users\Balaji\.julia\packages\DiffEqBase\s433k\src\solve.jl:529
 [11] solve_up(prob::ODEProblem{…}, sensealg::Nothing, u0::CuArray{…}, p::SciMLBase.NullParameters, args::CVODE_BDF{…}; kwargs::@Kwargs{})
    @ DiffEqBase C:\Users\Balaji\.julia\packages\DiffEqBase\s433k\src\solve.jl:1020
 [12] solve_up(prob::ODEProblem{…}, sensealg::Nothing, u0::CuArray{…}, p::SciMLBase.NullParameters, args::CVODE_BDF{…})
    @ DiffEqBase C:\Users\Balaji\.julia\packages\DiffEqBase\s433k\src\solve.jl:993
 [13] solve(prob::ODEProblem{…}, args::CVODE_BDF{…}; sensealg::Nothing, u0::Nothing, p::Nothing, wrap::Val{…}, kwargs::@Kwargs{})
    @ DiffEqBase C:\Users\Balaji\.julia\packages\DiffEqBase\s433k\src\solve.jl:930
 [14] solve(prob::ODEProblem{…}, args::CVODE_BDF{…})
    @ DiffEqBase C:\Users\Balaji\.julia\packages\DiffEqBase\s433k\src\solve.jl:920
 [15] top-level scope
    @ REPL[10]:1
 [16] top-level scope
    @ C:\Users\Balaji\.julia\packages\CUDA\rXson\src\initialization.jl:208
Some type information was truncated. Use `show(err)` to see complete types.

Sundials CVODE_BDF won’t support Julia GPU code. It’s in theory possible to set that up but it would be a lot more work than it’s worth.

It looks like there is indeed some bug with FBDF for GPUs. For now use:

sol = solve(prob, TRBDF2(linsolve= KrylovJL_GMRES()))

or

sol = solve(prob, Rodas5P(linsolve= KrylovJL_GMRES()))

which should do well with large stiff GPU problems.

As for the FBDF bug, it should be rather simple to fix. @Elrod would you happen to know why

@.. broadcast=false u_history[:, 1]=vuprev

falls back to scalar indexing? Directly doing the sequence:

julia> u0 = cu(rand(1000));

julia> A = cu(randn(1000, 1000));

julia> b = @view A[:, 1];

julia> b .= u0;

seems fine on GPU

2 Likes

It calls

julia> @macroexpand @.. broadcast=false u_history[:, 1]=vuprev
:((FastBroadcast.fast_materialize!)(static(false), static(false), (Base.maybeview)(u_history, :, 1), vuprev))

which should call

assuming vuprev is an AbstractArray.

In which case, what does copyto!(@view(u_history[:,1]), vuprev) do?
Does it use scalar indexing? If so, sounds like a missing method.

1 Like
u0 = cu(rand(1000));
A = cu(randn(1000, 1000));
b = @view A[:, 1];
copyto!(b,u0);

works. Am I missing something?

1 Like

Which FastBroadcast version?

I’ll test it with CUDA in a few hours.

1 Like
(@v1.10) pkg> st -m CUDA FastBroadcast
Status `~/.julia/environments/v1.10/Manifest.toml`
  [052768ef] CUDA v5.4.2
  [7034ab61] FastBroadcast v0.3.4

julia> using CUDA, FastBroadcast

julia> A = cu(randn(1000, 1000));

julia> u0 = cu(rand(1000));

julia> CUDA.allowscalar(false);

julia> @.. broadcast=false A[:,1] = u0;

julia> u0 == @view(A[:,1])
true

Do you have a minimal reproducer?

1 Like

@ChrisRackauckas

julia> sol = solve(prob, FBDF())
ERROR: Scalar indexing is disallowed.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
  [5] getindex
    @ ~/.julia/packages/GPUArrays/8Y80U/src/host/indexing.jl:48 [inlined]
  [6] copyto_unaliased!(deststyle::IndexLinear, dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, srcstyle::IndexLinear, src::CuArray{Float32, 1, CUDA.DeviceMemory})
    @ Base ./abstractarray.jl:1088
  [7] copyto!
    @ ./abstractarray.jl:1068 [inlined]

So, copyto! is being called, but…

[6] copyto_unaliased!(deststyle::IndexLinear, dest::SubArray{Float32, 1, Matrix{Float32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, srcstyle::IndexLinear, src::CuArray{Float32, 1, CUDA.DeviceMemory})
    @ Base ./abstractarray.jl:1088

the destination is a subarray of a regular CPU-Matrix, not of a CuArray`.
Seems like a bug in FBDF allocating the wrong type of container.

1 Like