Minimising DifferentialEquations GPU memory allocation

I’m simulating the Gross-Pitaevskii Equation for a three-dimensional array with M^3 values, and am running into some issues with GPU memory for large M values. In short, I’m wondering if there is anything more I can do to reduce how much memory solve() is allocating?

Currently I’m using CUDA.jl so that the dynamics are solved on the GPU, and am using a callback to intermittently save data back to CPU directly (see MWE below). I’ve managed to push N to ~ 300, but anything much bigger than that and it runs out of memory very quickly.

using FFTW, CUDA, DifferentialEquations, LinearAlgebra, Plots

    function kfunc_opt!(dψ,ψ)
        mul!(dψ,Pf,ψ)
        dψ .*= k2
        Pi!*dψ
        return nothing
    end

    function GPE!(dψ,ψ,var,t) # GPE Equation 
        kfunc_opt!(dψ,ψ)
        @. dψ = -(im + γ)*(0.5*dψ + (V_0 + abs2(ψ) - 1)*ψ)
    end

    function GPU_Solve(EQ!, ψ, tspan) 
        
        savepoints = tspan[2:end]  
        condition(u, t, integrator) = t ∈ savepoints
    
        function affect!(integrator)                 
            i += 1
            ret[:,:,:,i] .= Array(integrator.u)
        end
    
        ret = zeros(M,M,M,length(tspan)) .|> complex 
        ret[:,:,:,1] .= Array(ψ)
    
        cb = DiscreteCallback(condition, affect!)   
        i = 1                                           
        
        prob = ODEProblem(EQ!,ψ,(tspan[1],tspan[end]))   
        solve(prob, callback=cb, tstops = savepoints, save_start = false, save_everystep = false, save_end = false)
   
        return ret
    end

    L = 8
    M = 60
    
    x = LinRange(-L,L,M) |> cu;
    dx = x[2] - x[1]
    kx = fftfreq(M,2π/dx) |> collect |> cu;
    dkx = kx[2] - kx[1]

    k2 =  kx.^2 .+ kx'.^2 .+ reshape(kx,(1,1,M)).^2;
    V_0 = 0.3*[i^2 + j^2 + k^2 for i in x, j in x, k in x] |> cu;

    const Pf = Float32(dx^3/(2π)^1.5)*plan_fft(cu((rand(M,M,M) + im*rand(M,M,M))));
    const Pi! = Float32(M^3*dkx^3/(2π)^1.5)*plan_ifft!(cu((rand(M,M,M) + im*rand(M,M,M))));

γ = 1
tspan = LinRange(0.0,3,30); 

CUDA.memory_status()
res_GS = GPU_Solve(GPE!,cu((randn(M,M,M) + im*randn(M,M,M))),tspan);

begin
    t = 7 # Change this to look at different times
    heatmap(abs2.(res_GS[:,:,30,t]))
end

If any other info is required, or if there are any other glaring issues with my code please let me know. Cheers.

Is the equation stiff? If you don’t need the Jacobian then you can specialize on that, and if you do the need the Jacobian you should declare its sparsity pattern.

1 Like

Dont you have globals? Pf and Pi!

3 Likes

My bad, have fixed the MWE and made sure it’s actually running properly. Let me know if it’s still a bit unclear

The biggest allocation is still going to be the Jacobian

I’m not 100% sure if it’s stiff or not, the main (dimensionless) equation of motion is:

\frac{\partial \psi (\textbf{r},t)}{\partial t} = -\frac{i}{h} \left[ \frac{\nabla^2}{2} + V_0(\textbf{r}) + | \psi |^2 \right] \psi

Where V_0 is just a number at each point. Would this be stiff because of the -|\psi|^2 \psi term?

Is that a typo? Doesn’t make sense to multiply by Pi! without doing anything with it.

Nah not a typo, Pi!*d\psi does an in-place inverse fourier transform on d\psi

Just try choosing an explicit RK method like Tsit5