Differential Equations running out of GPU memory

Tommy_Fischer · July 1, 2023, 10:07am

Hi all, I posted earlier about minimising memory usage on gpu modelling a system here. Since then I’ve tidied up my code a bit and have managed to reduce memory usage, but it still runs out of memory when I try to scan over a range of parameters.

I’m setting ‘save_on = false’, which according to the documentation should stop it from saving intermediate states, and using a callback to save the solution to cpu at certain points in time. My understanding is that if it has enough memory to get through a few timesteps, it shouldn’t use any extra memory to go for an arbitrarily long time since it shouldn’t be saving intermediate timesteps.

What I find actually happens is for large grid sizes, it will run fine for a long time (100-200 seconds of simulated time), and then run out of memory and throw an error.

I’m basically wondering if there is another way to ensure that the solver isn’t saving any intermediate information, or if there’s a way to see what is taking up all of this memory so I can potentially use a callback to manually clear the gpu memory during solving?

MWE:

using FFTW, CUDA, DifferentialEquations, LinearAlgebra, Plots

function kfunc_opt!(dψ,ψ)
    mul!(dψ,Pf,ψ)
    dψ .*= k2
    Pi!*dψ
    return nothing
end

function GPE!(dψ,ψ,var,t) # GPE Equation 
    kfunc_opt!(dψ,ψ)
    @. dψ = -(im + γ)*(0.5*dψ + (V_0 + abs2(ψ) - 1)*ψ)
end

function GPU_Solve(save_array,EQ!, ψ, tspan) 
    
    savepoints = tspan[2:end]  
    condition(u, t, integrator) = t ∈ savepoints

    function affect!(integrator)    
         push!(save_array, Array(integrator.u))
    end

    push!(save_array, Array(ψ))
    cb = DiscreteCallback(condition, affect!)   
    i = 1                                           
    
    prob = ODEProblem(EQ!,ψ,(tspan[1],tspan[end]))   
    solve(prob, callback=cb, tstops = savepoints, save_on=false)

end

L = 8
M = 60

x = LinRange(-L,L,M) |> cu;
dx = x[2] - x[1]
kx = fftfreq(M,2π/dx) |> collect |> cu;
dkx = kx[2] - kx[1]

k2 =  kx.^2 .+ kx'.^2 .+ reshape(kx,(1,1,M)).^2;
V_0 = 0.3*[i^2 + j^2 + k^2 for i in x, j in x, k in x] |> cu;

const Pf = Float32(dx^3/(2π)^1.5)*plan_fft((cu(rand(M,M,M) + im*rand(M,M,M))));
const Pi! = Float32(M^3*dkx^3/(2π)^1.5)*plan_ifft!((cu(rand(M,M,M) + im*rand(M,M,M))));

γ = 0.05
tspan = LinRange(0.0,10,50); 

CUDA.memory_status()
res_GS = []
GPU_Solve(res_GS,GPE!,(cu(randn(M,M,M) + im*randn(M,M,M))),tspan);

 begin
     t = 3 # Change this to look at different times
     heatmap(abs2.(res_GS[t][:,:,30])) |> display
 end

ChrisRackauckas · July 1, 2023, 2:23pm

What solver do you need here? If you need something with stiff ODEs, do you have a sparse Jacobian? This is the same exact conclusion of Minimising DifferentialEquations GPU memory allocation

Tommy_Fischer · July 1, 2023, 11:21pm

I typically use Tsit5() or Vern6(), but have also looked into using some of the low-memory, fixed-timestep algorithms. I believe Vern6 doesn’t have a jacobian, and I need a dense jacobian anyway. I’m more asking here how to stop it from accumulating gpu memory while solving, rather than reducing the memory usage with each timestep.

ChrisRackauckas · July 2, 2023, 12:14am

If save_on = false, then it won’t save anything, so the only memory is from the caches for time stepping.

Tommy_Fischer · July 2, 2023, 12:55am

Should that be allocating much memory? I was running it on a machine with 40G gpu memory and it was eventually running out. And if so is there any way to reduce memory in caches?

ChrisRackauckas · July 2, 2023, 7:26am

How big is the state vector? Tsit5 for example needs 12 concurrent copies of it: OrdinaryDiffEq.jl/src/caches/low_order_rk_caches.jl at master · SciML/OrdinaryDiffEq.jl · GitHub . Pen and paper math, what does that come out in to in memory?

Tommy_Fischer · July 3, 2023, 2:43am

The vector is 256^3, using ComplexF32 so 8 bytes each. So it would just be: 8 * 256^3 * 12 = 1.61 GB per timestep? I’ve got some states saved and this lines up how much space they take up

ChrisRackauckas · July 3, 2023, 9:59am

Have you checked that your f calls are fully non-allocating? Try adding a CUDA.gc call inside of the f.

Tommy_Fischer · July 4, 2023, 1:10am

I’m pretty sure my f call is non-allocating, using benchmarktools and @btime it was something like 14kb of allocations. Just to check did you mean GC.gc()? I can’t find any documentation for CUDA.gc(). So just putting a GC.gc(true) in the function to make sure the garbage collector is clearing memory?

ChrisRackauckas · July 14, 2023, 12:52pm

@maleadt, is GC.gc(true) calls thrown around the best thing here?

maleadt · July 14, 2023, 1:04pm

That shouldn’t be required, as we try to collect memory before going OOM. But it may help, as would calling CUDA.reclaim(). But again, both shouldn’t be required, so an MWE that goes OOM without these calls would make a good issue on the CUDA.jl repository.

photor · January 6, 2024, 1:20pm

How about RK4? Will it use less copies than Tsit5? I mean for the adaptive time step.

ChrisRackauckas · January 7, 2024, 1:08am

If what you need is to match low RAM requirements, don’t use RK4. Use one of the optimized low storage methods documented here: ODE Solvers · DifferentialEquations.jl

photor · January 7, 2024, 2:59am

But it seems none of the low storage methods can use adaptive time step?

ChrisRackauckas · January 7, 2024, 3:15am

Yeah. If you need that, then BS3 might be a better option

photor · January 7, 2024, 12:06pm

BS3 may be of less consumption. I am also playing with GPE with state vector 256^3. Now on my machine with 6GB GPU memory, RK4 can work normally, but Tsit5 cannot, unfortunately.

photor · January 7, 2024, 2:53pm

It seems that Tsit5 can also work if using save_start=false in addition to save_on=false, weird @ChrisRackauckas

Topic		Replies	Views
Minimising DifferentialEquations GPU memory allocation Modelling & Simulations question	8	287	May 19, 2023
How to debug memory issues (when using DifferentialEquations) Modelling & Simulations	14	988	May 6, 2020
Reducing RAM usage when solving large set of differential equations Performance question	5	145	February 6, 2025
DifferentialEquations.jl: running out of memory with large system size ODE General Usage	11	1689	April 16, 2021
Why does EnsembleGPUArray not save at the given time points the solution? GPU	1	360	June 4, 2022

Differential Equations running out of GPU memory

Related topics