Hi,
I’m looking to see if there is any way I could solve an ensemble of ODEs similar to using the low-level API of EnsembleGPUKernel in DiffEqGPU, but where the ensemble trajectories are pulled from a multi-dimensional CuArray of inputs already on the GPU, and the final solution needs to be written back to said CuArray directly.
Big picture: I am using a CFD code (Oceananigans.jl) that runs on GPUs using KernelAbstractions and MPI, and stores its solution in individual CuArrays (when running on NVIDIA GPUs) for the velocities, pressure, and tracers, and then uses its own RK3 loop over all its kernelized algorithms. What I want is to add very stiff finite-rate chemical kinetics to the simulation by Strang splitting the fluid transport and reactions, and then integrate the stiff reactions using a separate ODE solver, ideally from a package like DifferentialEquations.jl so I can find the optimal solver and tolerances.
Small picture: I have this ODEProblem where, on a CPU, or when offloading data stored on CPU to GPU, I can copy the individual species data from each spatial point as a new ensemble in prob_func:
using OrdinaryDiffEq
using DiffEqGPU
using Metal
# using CUDA
# using BenchmarkTools: @btime
using StaticArrays: SVector
GPU = Metal.MetalBackend()
const FP = Float32
# Chemistry defined here, particulars not important
include("carbonate_system.jl")
CO₂ = 10.0
HCO₃ = 1670.0
CO₃ = 315.0
OH = 20.0
BOH₃ = 297.0
BOH₄ = 119.0
c0 = SVector{6, FP}(CO₂, HCO₃, CO₃, OH, BOH₃, BOH₄)
T = 25.0 # degrees Celsius
S = 35.0 # salinity in g/kg
p = SVector{2, FP}(T, S)
tspan = SVector{2, FP}(0.0, 1.0)
prob = ODEProblem(carbonate_rhs, c0, tspan, p)
sol = solve(prob, Rodas5(), reltol = 1e-6, abstol = 1e-10, save_everystep = false)
nx, ny, nz = (32, 32, 32)
ntraj = nx * ny * nz
i2s = CartesianIndices((nx, ny, nz))
P1 = rand(nx, ny, nz) .* T; # values from 0 to 25
C1 = (0.2 .* rand(nx, ny, nz) .+ 0.9) * CO₂; # values from 0.9 to 1.1 x baseline
C2 = (0.2 .* rand(nx, ny, nz) .+ 0.9) * HCO₃;
C3 = (0.2 .* rand(nx, ny, nz) .+ 0.9) * CO₃;
C4 = (0.2 .* rand(nx, ny, nz) .+ 0.9) * OH;
C5 = (0.2 .* rand(nx, ny, nz) .+ 0.9) * BOH₃;
C6 = (0.2 .* rand(nx, ny, nz) .+ 0.9) * BOH₄;
function prob_func(prob, i, repeat)
ijk = i2s[i]
C_n = SVector{6, FP}(C1[ijk], C2[ijk], C3[ijk], C4[ijk], C5[ijk], C6[ijk])
p = SVector{2, FP}(P1[ijk], S) # S is constant
return remake(prob; u0 = C_n, p)
end
# for testing purposes, use different output arrays
C1′ = zeros(FP, nx, ny, nz);
C2′ = zeros(FP, nx, ny, nz);
C3′ = zeros(FP, nx, ny, nz);
C4′ = zeros(FP, nx, ny, nz);
C5′ = zeros(FP, nx, ny, nz);
C6′ = zeros(FP, nx, ny, nz);
function output_func(sol, i)
ijk = i2s[i]
C_np1 = sol.u[end]
C1′[ijk] = C_np1[1]
C2′[ijk] = C_np1[2]
C3′[ijk] = C_np1[3]
C4′[ijk] = C_np1[4]
C5′[ijk] = C_np1[5]
C6′[ijk] = C_np1[6]
return (nothing, false)
end
vprob = EnsembleProblem(prob; output_func, prob_func, safetycopy = false)
vsol = solve(vprob, GPURodas4(), EnsembleGPUKernel(GPU), trajectories = ntraj,
reltol = 1f-6, abstol = 1f-10, save_everystep = false)
Is there any way to adapt this example for when my “ensemble” data is already on the GPU as CuArrays, with the obvious problem being I can’t simply index into the CuArrays? Or am I stuck with re-writing my chemical system to look something like the Brusselator example (GPU-Acceleration of a Stiff Nonlinear Partial Differential Equation · DiffEqGPU.jl), where I broadcast the single-point ODE func onto a big 4D solution array (nx, ny, nz, nspecies), passed as a Ref? I’d like to avoid that if possible, as I’m not sure if I’d be able to keep memory allocations reasonable.