DIfferentialEquations and GPU

fedoroff · February 26, 2019, 4:16pm

Hello,

According to the following blog post http://www.stochasticlifestyle.com/solving-systems-stochastic-pdes-using-gpus-julia/ it is very easy to turn on GPU parallelism with DifferentialEquations.jl. The only thing one needs to do is to replace ordinary arrays with CLArrays or CuArrays. However, the following code shows that the GPU version is orders of magnitude slower:

using DifferentialEquations
using CuArrays


function func(du, u, p, t)
    @inbounds @. du = u
    return nothing
end


tspan = (0., 10.)
alg = BS3()


# ******************************************************************************
println("CPU:")

u0 = fill(1., 500)
prob = ODEProblem(func, u0, tspan)

@time sol = solve(prob, alg, saveat=0.01)
@time sol = solve(prob, alg, saveat=0.01)
@time sol = solve(prob, alg, saveat=0.01)


# ******************************************************************************
println("GPU:")

# CuArrays.allowscalar(false)

u0_gpu = CuArray(convert(Array{Float32, 1}, u0))
prob_gpu = ODEProblem(func, u0_gpu, tspan)

@time sol_gpu = solve(prob_gpu, alg, saveat=0.01)
@time sol_gpu = solve(prob_gpu, alg, saveat=0.01)
@time sol_gpu = solve(prob_gpu, alg, saveat=0.01)

The output on my machine with Nvidia GTX 1070 is the following:

CPU:
  3.627708 seconds (14.59 M allocations: 756.548 MiB, 5.28% gc time)
  0.002929 seconds (1.11 k allocations: 4.082 MiB)
  0.003089 seconds (1.11 k allocations: 4.082 MiB)
GPU:
 32.020281 seconds (43.64 M allocations: 2.092 GiB, 1.71% gc time)
 24.563494 seconds (20.42 M allocations: 951.305 MiB, 0.49% gc time)
 27.890973 seconds (20.42 M allocations: 951.711 MiB, 0.48% gc time)

Can you please explain me what am I doing wrong?

Thank you.

maleadt · February 26, 2019, 6:10pm

Did you run with that line uncommented? If that errors, there’s a scalar operation that isn’t supported by CuArrays.jl (yet) and will completely ruin performance.

fedoroff · February 26, 2019, 7:02pm

Yes, it gives the error.

Well, then I will try to reformulate the question. Are there any examples which show how DifferentialEquations can be speed up using GPU? So far I have found only the above blog post which discusses this possibility.

rveltz · February 26, 2019, 7:13pm

Maybe not completely what you request but see

alandion · February 26, 2019, 7:13pm

not to my knowledge. A few points though:

it will be difficult to speed up real problems on gpu, even big ones, without having to write a custom kernel somewhere
differential equations solvers are typically iterative in nature, so the speedup from the GPU needs to be the individual array operations themselves. This is a hard problem to speed up on GPU unless the operations involved are very large and very computationally intensive
for your example, the GPU probably won’t beat the CPU by much even if func is written as a custom kernel. you’re pretty much just testing the L2 cache bandwidth of a single CPU core vs L2 cache on a GPU.

fedoroff · February 26, 2019, 7:43pm

Thank you all!

ChrisRackauckas · February 26, 2019, 10:46pm

You’re timing the worst case scenario for GPUs, which is where all of your kernel calls are trivial. Make your ODE something with a matrix multiplication or something else with a lot of linear algebra and you’ll see a different result. BLAS1 computations (“element-wise” or broadcast) is usually not generally enough to warrant GPU usage.

ChrisRackauckas · February 26, 2019, 10:47pm

We’re working on this. We will have a GSoC that optimizes GPU code and gets it running with methods for stiff ODEs, focusing on getting solvers for PDEs.

Aquaman · December 12, 2019, 1:57pm

fedoroff:

using DifferentialEquations using CuArrays function func(du, u, p, t) @inbounds @. du = u return nothing end tspan = (0., 10.) alg = BS3() # ****************************************************************************** println(“CPU:”) u0 = fill(1., 500) prob = ODEProblem(func, u0, tspan) @time sol = solve(prob, alg, saveat=0.01) @time sol = solve(prob, alg, saveat=0.01) @time sol = solve(prob, alg, saveat=0.01) # ****************************************************************************** println(“GPU:”) # CuArrays.allowscalar(false) u0_gpu = CuArray(convert(Array{Float32, 1}, u0)) prob_gpu = ODEProblem(func, u0_gpu, tspan) @time sol_gpu = solve(prob_gpu, alg, saveat=0.01) @time sol_gpu = solve(prob_gpu, alg, saveat=0.01) @time sol_gpu = solve(prob_gpu, alg, saveat=0.01)

Hey @ChrisRackauckas,

can we now solve this issue?


CPU:
  2.448770 seconds (8.22 M allocations: 409.291 MiB, 7.44% gc time)
  0.002015 seconds (1.09 k allocations: 4.070 MiB)
  0.001831 seconds (1.09 k allocations: 4.070 MiB)
GPU:
 26.472584 seconds (41.66 M allocations: 2.079 GiB, 6.59% gc time)
  0.221883 seconds (140.06 k allocations: 15.828 MiB)
  0.269016 seconds (170.76 k allocations: 16.724 MiB, 4.14% gc time)

ChrisRackauckas · December 12, 2019, 2:30pm

You had time as Float64, and the kernels weren’t filled. Here’s an example:

using OrdinaryDiffEq
using CuArrays
CuArrays.allowscalar(false)

function func(du, u, p, t)
    @inbounds @. du = u
    return nothing
end


tspan = (0f0, 10f0)
alg = BS3()
u0 = fill(1f0, Int(5e5))

# ******************************************************************************
println("CPU:")
prob = ODEProblem(func, u0, tspan)

@time sol = solve(prob, alg, saveat=1f-2)
@time sol = solve(prob, alg, saveat=1f-2)
@time sol = solve(prob, alg, saveat=1f-2)


# ******************************************************************************
println("GPU:")

# CuArrays.allowscalar(false)

u0_gpu = CuArray(convert(Array{Float32, 1}, u0))
prob_gpu = ODEProblem(func, u0_gpu, tspan)

sol_gpu = nothing; GC.gc(true); CuArrays.reclaim()
@time sol_gpu = solve(prob_gpu, alg, saveat=1f-2)
sol_gpu = nothing; GC.gc(true); CuArrays.reclaim()
@time sol_gpu = solve(prob_gpu, alg, saveat=1f-2)
sol_gpu = nothing; GC.gc(true); CuArrays.reclaim()
@time sol_gpu = solve(prob_gpu, alg, saveat=1f-2)

CPU:
  3.288380 seconds (8.77 M allocations: 2.316 GiB, 12.26% gc time)
  0.912710 seconds (2.10 k allocations: 1.891 GiB)
  0.973323 seconds (2.10 k allocations: 1.891 GiB, 8.27% gc time)
GPU:
  9.955818 seconds (27.20 M allocations: 1.347 GiB, 3.39% gc time)
  0.462713 seconds (113.43 k allocations: 6.869 MiB)
  0.452901 seconds (113.44 k allocations: 6.878 MiB)

Aquaman · December 13, 2019, 10:48am

ChrisRackauckas:

using OrdinaryDiffEq using CuArrays CuArrays.allowscalar(false) function func(du, u, p, t) @inbounds @. du = u return nothing end tspan = (0f0, 10f0) alg = BS3() u0 = fill(1f0, Int(5e5)) # ****************************************************************************** println(“CPU:”) prob = ODEProblem(func, u0, tspan) @time sol = solve(prob, alg, saveat=1f-2) @time sol = solve(prob, alg, saveat=1f-2) @time sol = solve(prob, alg, saveat=1f-2) # ****************************************************************************** println(“GPU:”) # CuArrays.allowscalar(false) u0_gpu = CuArray(convert(Array{Float32, 1}, u0)) prob_gpu = ODEProblem(func, u0_gpu, tspan) sol_gpu = nothing; GC.gc(true); CuArrays.reclaim() @time sol_gpu = solve(prob_gpu, alg, saveat=1f-2) sol_gpu = nothing; GC.gc(true); CuArrays.reclaim() @time sol_gpu = solve(prob_gpu, alg, saveat=1f-2) sol_gpu = nothing; GC.gc(true); CuArrays.reclaim() @time sol_gpu = solve(prob_gpu, alg, saveat=1f-2)

Now it’s solved, Thanks!

fedoroff · December 25, 2019, 1:40pm

Hello Chris,

Thank you very much for making it work.

Can you, please, comment on this line:

sol_gpu = nothing; GC.gc(true); CuArrays.reclaim()

What exactly happens here and do I need to do the same in the actual code before each run of solve?

ChrisRackauckas · December 25, 2019, 7:05pm

Oh, that was just for clearing memory when testing

Super_E · December 6, 2020, 10:06am

I notice that you use OrdinaryDiffEq instead of DifferentialEquations. Does that make any difference? Is that means DifferentialEquations cant be used this way?

freemint · December 6, 2020, 2:09pm

Hi,
thanks for asking.

DifferentialEquations is a way heavier package since it includes code for different kinds of differential equations (including stochastical/delay differential equations) as well as automatic solver choices.
If you are just interestend in solving Ordinary Differential Equations OrdinaryDiffEq is the way to go, since it loads faster. @ChrisRackauckas as one of the mayor contributors to the differential equations ecosystem probably just intuitively used the package that used the fast loading required package.

I hope that help you.

ChrisRackauckas · December 7, 2020, 2:44am

Yes there’s no difference. I was just using the exact library instead of the metapackage.

Topic		Replies	Views
DifferentialEquations + RecursiveArrayTools Broadcasting / CuArrays Kernel compilation error -- GPU diffeq	6	862	February 4, 2020
DiffEq documentation example slower on GPU (33 sec) than on CPU (0.14 sec) GPU diffeq , performance , differentialequation	4	296	May 25, 2023
]test DiffEqGPU errors with " unsupported call to the Julia runtime" GPU first-steps	19	1611	September 20, 2019
Solving ODE on GPU from Python with DifferentialEquations.jl New to Julia gpu , python , differentialequation	11	1006	May 29, 2025
Diffusion by using CUDA.jl, DifferentialEquations and DiffEqGPU GPU question , diffeq	6	1235	March 29, 2022

DIfferentialEquations and GPU

Related topics