I’m simulating the Gross-Pitaevskii Equation for a three-dimensional array with M^3 values, and am running into some issues with GPU memory for large M values. In short, I’m wondering if there is anything more I can do to reduce how much memory solve() is allocating?

Currently I’m using CUDA.jl so that the dynamics are solved on the GPU, and am using a callback to intermittently save data back to CPU directly (see MWE below). I’ve managed to push N to ~ 300, but anything much bigger than that and it runs out of memory very quickly.

```
using FFTW, CUDA, DifferentialEquations, LinearAlgebra, Plots
function kfunc_opt!(dψ,ψ)
mul!(dψ,Pf,ψ)
dψ .*= k2
Pi!*dψ
return nothing
end
function GPE!(dψ,ψ,var,t) # GPE Equation
kfunc_opt!(dψ,ψ)
@. dψ = -(im + γ)*(0.5*dψ + (V_0 + abs2(ψ) - 1)*ψ)
end
function GPU_Solve(EQ!, ψ, tspan)
savepoints = tspan[2:end]
condition(u, t, integrator) = t ∈ savepoints
function affect!(integrator)
i += 1
ret[:,:,:,i] .= Array(integrator.u)
end
ret = zeros(M,M,M,length(tspan)) .|> complex
ret[:,:,:,1] .= Array(ψ)
cb = DiscreteCallback(condition, affect!)
i = 1
prob = ODEProblem(EQ!,ψ,(tspan[1],tspan[end]))
solve(prob, callback=cb, tstops = savepoints, save_start = false, save_everystep = false, save_end = false)
return ret
end
L = 8
M = 60
x = LinRange(-L,L,M) |> cu;
dx = x[2] - x[1]
kx = fftfreq(M,2π/dx) |> collect |> cu;
dkx = kx[2] - kx[1]
k2 = kx.^2 .+ kx'.^2 .+ reshape(kx,(1,1,M)).^2;
V_0 = 0.3*[i^2 + j^2 + k^2 for i in x, j in x, k in x] |> cu;
const Pf = Float32(dx^3/(2π)^1.5)*plan_fft(cu((rand(M,M,M) + im*rand(M,M,M))));
const Pi! = Float32(M^3*dkx^3/(2π)^1.5)*plan_ifft!(cu((rand(M,M,M) + im*rand(M,M,M))));
γ = 1
tspan = LinRange(0.0,3,30);
CUDA.memory_status()
res_GS = GPU_Solve(GPE!,cu((randn(M,M,M) + im*randn(M,M,M))),tspan);
begin
t = 7 # Change this to look at different times
heatmap(abs2.(res_GS[:,:,30,t]))
end
```

If any other info is required, or if there are any other glaring issues with my code please let me know. Cheers.