# Solving stiff DAE with Float32 inputs for parallel simulations

Hello all!

I’m using DifferentialEquations.jl to solve a stiff DAE. I’ve tried both mass matrix and in-place methods with Rodas4 and IDA respectively at tolerances of 1e-8, and the result is the same (albeit IDA tends to be faster).

This model has a small set of tuning parameters that I would like to optimize. I thought parallel ensemble simulations was the best fit, especially since I have a GPU from NVIDIA. As I’ve read, GPUs work better with Float32 numbers. Additionally, as I understand it the problem now must be in mass matrix form because ensemble simulations require solvers written in Julia (so farewell Sundials.jl).

However, once I change the inputs to Float32 and attempt to solve the system, no solver is able to converge, i.e. I get the InitialFailure retcode everytime. All I did was change the type of float. I’ve tried a few things:

• Different solvers - Rosenbrock23, Rodas4, Rodas4P, Rodas4P2, etc.
• Relaxing the tolerances (1e-4, which is still reasonable)
• Modelingtoolkitize for analytical jacobians and time gradients (works with Float64 and yields the same expected results)

Same results. So I thought I’d stick to Float64 and try the ensemble simulations all the same:

``````using DifferentialEquations, ModelingToolkit, CUDA, DiffEqGPU
f       = ODEFunction(model, mass_matrix=Diagonal([1,1,1,1,1,1,0,0]))
prob    = ODEProblem(f, x₀, tspan, p)
sys     = modelingtoolkitize(prob)
f       = ODEFunction(sys, jac=true, tgrad=true, mass_matrix=Diagonal([1,1,1,1,1,1,0,0]))
prob_j  = ODEProblem(f, x₀, tspan, p)
ens     = EnsembleProblem(prob_j)
sol     = solve(ens, Rodas4(), EnsembleGPUArray(CUDA.CUDABackend()), trajectories=10, saveat=1.0f0)
``````

Which yields the following error:

``````ERROR: GPU compilation of MethodInstance for DiffEqGPU.gpu_gpu_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::ModelingToolkit.var"#f#531"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x77563588, 0x499ec9b7, 0x3b31503d, 0xdff47b82, 0xf6f5cef8), Expr}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, :ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x41231819, 0x0205e81c, 0xb065ed56, 0x20ebc83e, 0xf4e4193f), Expr}}, ::CUDA.CuDeviceMatrix{Float64, 1}, ::CUDA.CuDeviceMatrix{Float64, 1}, ::CUDA.CuDeviceMatrix{NTuple{11, Float64}, 1}, ::Float64) failed
KernelError: passing and using non-bitstype argument

Argument 3 to your kernel function is of type ModelingToolkit.var"#f#531"{RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x77563588, 0x499ec9b7, 0x3b31503d, 0xdff47b82, 0xf6f5cef8), Expr}, RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, :ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x41231819, 0x0205e81c, 0xb065ed56, 0x20ebc83e, 0xf4e4193f), Expr}}, which is not isbits:
.f_oop is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x77563588, 0x499ec9b7, 0x3b31503d, 0xdff47b82, 0xf6f5cef8), Expr} which is not isbits.
.body is of type Expr which is not isbits.
.head is of type Symbol which is not isbits.
.args is of type Vector{Any} which is not isbits.
.f_iip is of type RuntimeGeneratedFunctions.RuntimeGeneratedFunction{(:ˍ₋out, :ˍ₋arg1, :ˍ₋arg2, :t), ModelingToolkit.var"#_RGF_ModTag", ModelingToolkit.var"#_RGF_ModTag", (0x41231819, 0x0205e81c, 0xb065ed56, 0x20ebc83e, 0xf4e4193f), Expr} which is not isbits.
.body is of type Expr which is not isbits.
.head is of type Symbol which is not isbits.
.args is of type Vector{Any} which is not isbits.
``````

I’m at a loss. Any help is greatly appreciated! However, I cannot provide a MWE. These are the versions of the packages:

CUDA v4.3.0
DiffEqGPU v2.2.1
DifferentialEquations v7.7.0
ModelingToolkit v8.57.0

Oh, I remembered something I posted that seems to still be an open issue, which is regarding the modelingtoolkitize and its interaction with DiffEqGPU. So that’s one thing. However, the following solution suggested by Chris also yields an error:

``````f       = ODEFunction(model, mass_matrix=Diagonal([1,1,1,1,1,1,0,0]))
prob    = ODEProblem(f, x₀, tspan, p)
sys     = modelingtoolkitize(prob)
sys_f   = eval(generate_function(sys)[2])
sys_jac = eval(generate_jacobian(sys)[2])
prob    = ODEProblem(f, x₀, tspan, p)
ens     = EnsembleProblem(prob, safetycopy=false)
sol     = solve(ens, Rodas4(), EnsembleGPUArray(CUDA.CUDABackend()),trajectories=10,saveat=1.0f0)

ERROR: InvalidIRError: compiling MethodInstance for DiffEqGPU.gpu_Wt_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::Main.Fowm.var"#3#4", ::CUDA.CuDeviceArray{Float64, 3, 1}, ::CUDA.CuDeviceMatrix{Float64, 1}, ::CUDA.CuDeviceMatrix{NTuple{11, Float64}, 1}, ::Float64, ::Float64) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_gc_pool_alloc)
Stacktrace:
[1] macro expansion
@ C:\Users\ricardof\.julia\packages\SymbolicUtils\H684H\src\code.jl:395
[2] macro expansion
@ C:\Users\ricardof\.julia\packages\Symbolics\3jLt1\src\build_function.jl:520
[3] #3
@ C:\Users\ricardof\.julia\packages\SymbolicUtils\H684H\src\code.jl:352
[4] macro expansion
@ C:\Users\ricardof\.julia\packages\DiffEqGPU\N3FAn\src\DiffEqGPU.jl:130
[5] gpu_Wt_kernel
@ C:\Users\ricardof\.julia\packages\KernelAbstractions\LVKmi\src\macros.jl:81
[6] gpu_Wt_kernel
@ .\none:0
Reason: unsupported dynamic function invocation (call to *)
Stacktrace:
[1] macro expansion
@ C:\Users\ricardof\.julia\packages\SymbolicUtils\H684H\src\code.jl:395
[2] macro expansion
@ C:\Users\ricardof\.julia\packages\Symbolics\3jLt1\src\build_function.jl:520
[3] #3
@ C:\Users\ricardof\.julia\packages\SymbolicUtils\H684H\src\code.jl:352
[4] macro expansion
@ C:\Users\ricardof\.julia\packages\DiffEqGPU\N3FAn\src\DiffEqGPU.jl:130
[5] gpu_Wt_kernel
@ C:\Users\ricardof\.julia\packages\KernelAbstractions\LVKmi\src\macros.jl:81
[6] gpu_Wt_kernel
@ .\none:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
``````

It’s not surprising that `Float32` fails. If your problem is stiff enough to use a stiff solver, your jacobian is poorly conditioned enough that `Float32` will be at the edge of viable.

1 Like

Oh funny it’s mentioned. The solution merged yesterday:

and then the Symbolics part I just merged:

So I just tagged a few versions and that issue is now solved.

That’s exactly it. Check the condition number of your Jacobian. Remember that:

``````julia> eps(Float32)
1.1920929f-7
``````

in other words, with Float32 it’s not possible to solve to 8 digits of accuracy. A tolerance of 1e-8 is simply below the relative tolerance that Float32 can even represent well.

The condition number of your Jacobian both specifies whether something is stiff and and whether it has numerical issues. This is why in general stiff problems don’t do well with Float32. Without being able to run your model I cannot say something more specific, but what you’re saying is at least somewhat expected.

I guess we’ve come full circle, eh? Glad my curiosity sparked a contribution!

Thank you @Oscar_Smith and @ChrisRackauckas for the replies! Indeed this world of Float32s and GPUs is a little beyond my knowledge. I guess I’ll try other options.

Cheers!

1 Like