Can you solve a NeuralODEMM on GPU?

I’m trying to solve a neural ODE with a mass matrix using CUDA. Is this possible, or are the required solvers not GPU-compatible? I can define a NeuralODEMM with a CuArray Chain, but I get the error ERROR: LoadError: scalar getindex is disallowed when I try to apply the model to an initial condition.

Thank you!

MWE, based on here: https://github.com/SciML/DiffEqFlux.jl/blob/master/test/neural_ode_mm.jl and here: https://diffeqflux.sciml.ai/dev/GPUs/

using Flux, DiffEqFlux, GalacticOptim, OrdinaryDiffEq
using CUDA; CUDA.allowscalar(false)

# generate training data
function f(du,u,p,t)
    y₁,y₂,y₃ = u
    k₁,k₂,k₃ = p
    du[1] = -k₁*y₁ + k₃*y₂*y₃
    du[2] =  k₁*y₁ - k₃*y₂*y₃ - k₂*y₂^2
    du[3] =  y₁ + y₂ + y₃ - 1
    nothing
end

# initial condition
u₀ = [1.0, 0, 0]
# mass matrix
M = [1. 0  0
     0  1. 0
     0  0  0]
tspan = (0.0,1.0)
p = [0.04, 3e7, 1e4]
func = ODEFunction(f, mass_matrix=M)
prob = ODEProblem(func, u₀, tspan, p)
sol = solve(prob, Rodas5(), saveat=0.1)

# define model
dudt2 = Chain(Dense(3, 64, tanh), Dense(64, 2)) |> gpu
ndae = NeuralODEMM(dudt2,
                   (u,p,t) -> [u[1] + u[2] + u[3] - 1],
                   tspan,
                   M,
                   Rodas5(autodiff=false),
                   saveat=0.1)
u0_gpu = u₀ |> gpu
display(typeof(ndae.p))  # shows CuArray{Float32,1}
display(typeof(ndae(u0_gpu)))  # throws error ERROR: LoadError: scalar getindex is disallowed