We are currently analyzing the performance of ensemble simlations on CPUs and GPUs. In our case, we noticed that the ensemble simulation on the GPU with EnsembleGPUArray only stresses the GPU for about 6s. Nevertheless, the overall calculation time using EnsembleGPUArray is around 115s, which is shown below. It seems that the preparation for the computations on the GPU and the copying of the data from host to deivce and the other way around takes a great amount of time.
CPU Ensemble Calculations
187.014356 seconds (109.21 M allocations: 14.871 GiB, 2.22% gc time, 0.00% compilation time)
180.962341 seconds (85.79 M allocations: 13.464 GiB, 2.74% gc time)
GPU Ensemble Calculations
190.710367 seconds (130.19 M allocations: 231.045 GiB, 31.04% gc time, 3.68% compilation time)
115.335076 seconds (31.41 M allocations: 225.337 GiB, 15.32% gc time)
Is it possible to improve the performance for EnsembleGPUArray since the GPU is only stressed for a small fraction of the overall calculation time? Are we missing something?
The code executing the ensemble simulations is shown below. The function ROMO is a vehicle model and is omitted here due to readability reasons. The used GPU is a NVIDIA GeForce RTX 3080.
using OrdinaryDiffEq
using DifferentialEquations
using Distributed
using DiffEqGPU
using CUDA
using Plots
function ROMO(dx,x,p,t)
...
end
ENV["JULIA_NUM_THREADS"]
x0 = Float32[0.0; 20.0; 0.0; 0.0; 0.0; 0.0; 0.0; 0.0; 100; 100; 0.0; 0.0]
tspan = (0.0, 400.0)
savetime = 1.0f0
p = [1013.0f0, 1130.0f0, 0.8f0]
prob = ODEProblem(ROMO,x0,tspan,p)
prob_func = (prob,i,repeat) -> remake(prob,p=p+0.2*rand(Float32,3).*p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy=false)
println("CPU Ensemble Calculations")
@time sol = solve(monteprob,RK4(),EnsembleThreads(),trajectories=100_000,saveat=savetime, dt=0.001);
sol=0
GC.gc()
@time sol = solve(monteprob,RK4(),EnsembleThreads(),trajectories=100_000,saveat=savetime, dt=0.001);
sol=0
GC.gc()
println("GPU Ensemble Calculations")
@time sol = solve(monteprob,RK4(),EnsembleGPUArray(),trajectories=100_000,saveat=savetime, dt=0.001);
sol=0
GC.gc()
@time sol = solve(monteprob,RK4(),EnsembleGPUArray(),trajectories=100_000,saveat=savetime, dt=0.001);
sol=0
GC.gc()