I am trying to perform Bayesian analysis on a Turing model with an ecological ODE system embedded; consider it to be a complex variant of a Lotka-Volterra model. As of now, the code is running perfectly fine without errors, but it is taking multiple days for finishing the sampling. Inference from 7000 draws for a model with 10 parameters is taking close to 3 days. So, I am planning to implement it on GPUs.
Can someone help me as to how to code appropriately? I couldn’t find any tutorials regarding how to perform inference using GPUs. Are there any tutorials available online. Following are some details:
I use the default NUTS sampler and AutoVern7(Rodas5()) ODE solver.
None of the functions I use are custom-made. For instance, no custom prior distributions, no custom likelihood functions, etc.
Following are the packages I need to use:
using DifferentialEquations, Interpolations, XLSX, DataFrames
using Distributions, MCMCChains, Turing
using ForwardDiff, Preferences
I couldn’t find any easy tutorials or documentation as to how to go forward with Turing. Can someone please help me?
I can share more details of the code if needed and also the implementation process. To reiterate, current code using CPU cores is running fine and the only drawback is that it takes a lot of time for execution.
What did you try? If you just follow DiffEqGPU’s tutorials for example inside of a Turing.jl module it should just work. There’s no combined tutorial, but if you just stick a tutorial for GPU solving of ODEs with a tutorial of Turing.jl then you get both together.
Is this an issue with the performance of the code (i.e. loglike and grad loglike are expensive) or with the exploration of the posterior? For the former, try to benchmark it using Home · TuringBenchmarking.jl. For the latter, a good suggestion is Pathfinder (Turing usage · Pathfinder.jl) that let you start your chains closer to the typical set and potentially with a good initial mass matrix.
General question:
How complex is the model you are considering? How long does it take to run?
Honestly - I don’t know. The ODE system is pretty straight forward, and the raw data is behaving exactly as I theorised and expected. Honing in on the parameters for the ODE in turing, however, is taking a long time.
As a brief outline - my ODE is representing a basic system of 16 transition rates amongst 6 states (all first order dy_i/ dt). Each rate is parameterised with 4 variables for a logistic shape; 2 of them are in log space (log(0.001) to log(0.1)-range and exponentiated before feeding into the ODE solver. With a few non-centering, this gives around 128 parameters to be calibrated. The response is the index of the state as a categorical-type variable.
With around 2000 rows of observations, 2 chains, 1000 iterations, NUTS(0.65), using the Rosenbrock solver (tolerance at 1e-5, maxiter10^5), on a macbook m3max - it’s taking around 30-40 hours to run.
I’ve added some bounderies to early-stop in case of edge case search, which helped, but not by much.
Tried DiffCache on the u0 and p too, but that actually slows it down even more.
I’ve only been able to use the ForwardDiff engine so far; ReverseDiff or Mooncake both gave errors about some gradient which I can’t recall top of my head. The code does some quick normalisation on the ODE outputs to ensure all states’ probability sums to 1 so maybe that breaks that.
With 128 parameters ForwardDiff is unfortunately always going to be slow. I can’t guarantee that there will be a quick fix, but the “correct” first step would certainly be to use reverse-mode AD, or in this case, to make reverse-mode AD work, so I would definitely suggest posting an issue on the Mooncake.jl repo with the error and an MWE.
TaskFailedException
nested task error: TaskFailedException
nested task error: MooncakeRuleCompilationError: an error occurred while Mooncake was compiling a rule to differentiate something. If the `caused by` error message below does not make it clear to you how the problem can be fixed, please open an issue at github.com/chalk-lab/Mooncake.jl describing your problem.
When using ReverseDiff:
TaskFailedException
nested task error: TaskFailedException
nested task error: setindex!() with non-isbitstype eltype is not supported by StaticArrays. Consider using SizedArray.
I’ll try and get a MWE later, but I suspect this normalisation block within Turing is the culprit for the Mooncake error:
# --- Build normalised probability grid ---
P_grid = Matrix{T}(undef, n_states, n_t)
@inbounds for m in 1:n_t
v = ode_sol.u[m]
s = zero(T)
for k in 1:n_states
x = v[k]
# numerical safety: avoid log(0) later and negative drift
x = ifelse(x < T(1e-12), T(1e-12), x)
P_grid[k, m] = x
s += x
end
inv_s = one(T) / s
for k in 1:n_states
P_grid[k, m] *= inv_s
end
end
# Precompute log-prob grid
logP = log.(P_grid)
As for Enzyme - I’m using Julia 1.12, and I think Enzyme has a warning that it has only been tested up to 1.11 for now.