Large memory consumption when using Mooncake via DifferentiationInterface for Gaussian process optimisation

Nikos_Gianniotis · April 14, 2025, 7:29pm

I just tried out the following code based on code provided on the AbstractGP.jl repo. One can run it by copy-paste provided the package necessary are available:

using AbstractGPs, DifferentiationInterface, Optim, StatsFuns, Random

import Mooncake
import Zygote

let 
    rng = MersenneTwister(1)

    x = randn(rng, 10_000)
    y = randn(rng, 10_000)

    f = GP(Matern52Kernel())
    noise_var = 0.1
    fx = f(x, noise_var)

    function loss_function(x, y)
        function negativelogmarginallikelihood(params)
            kernel =
                softplus(params[1]) * (Matern52Kernel() ∘ ScaleTransform(softplus(params[2])))
            f = GP(kernel)
            fx = f(x, noise_var)
            return -logpdf(fx, y)
        end
        return negativelogmarginallikelihood
    end
    
    θ0 = randn(rng, 2)


    # comment in to use mooncake - runs out of memory
    opt = Optim.optimize(loss_function(x[1:4],y[1:4]), θ0, LBFGS(), autodiff=AutoMooncake(config=nothing)) # warmup
    opt = Optim.optimize(loss_function(x,y), θ0, LBFGS(), autodiff=AutoMooncake(config=nothing))
    
    # comment in to use Zygote - runs out of memory
    #opt = Optim.optimize(loss_function(x[1:4],y[1:4]), θ0, LBFGS(), autodiff=AutoZygote()) # warmup
    #opt = Optim.optimize(loss_function(x,y), θ0, LBFGS(), autodiff=AutoZygote())
end

For the above example of 10_000 data items, my 32GB machine runs out of memory and Julia is terminated.

Topic		Replies	Views
Excessive memory consumption when optimising Gaussian process regression with automatic reverse differentiation Performance question , gaussian-process , autodiff	2	153	April 7, 2025
`Zygote.gradient` is 54000 TIMES slower than `jax.gradient` Optimization (Mathematical) zygote , jax	80	1907	February 1, 2025
Optim.optimize() uses excessive memory with autodiff? Optimization (Mathematical) optim	11	1765	June 22, 2018
Hessian preparation using DifferentationInterface and Mooncake throws a stackoverflow error Specific Domains question	9	298	February 17, 2025
Make mutating function more AD-friendly Performance tullio , autodiff , enzyme	10	300	October 23, 2025

Large memory consumption when using Mooncake via DifferentiationInterface for Gaussian process optimisation

Related topics