Hello everyone,
I am working on a model called the Gaussian process latent variable model (GPLVM). The GPLVM is a dimensionality reduction method which given a high dimensional dataset Y
will return low-dimensional projections X
.
In order to optimise the free parameters, I use Optim.jl in conjunction with DifferentiationInterface.jl and Mooncake.jl that automatically calculate the gradients for me. While I am happy with this setup, I have noticed a very high memory consumption: the top
utility informs me that more than 20% of my 32GB memory is in use, and this just for a modest number of data items. While in principle I don’t mind this, if I increase the number of data items, more memory is consumed and the system will kill my julia session.
My actual code is quite lengthy. For the purposes of this question, I have cut it down significantly, but it still exceeds the length of a typical MWE. The code is organised in three functions:
- function
gplvm
is called by the user. It sets up the optimiser options, sets up the gradient, calls the optimiser and finally returns the low dimensional coordinatesX
inferred by the GPLVM model. - function
negativemarginallikelihood
is the objective function to be minimised with respect to the coordinatesX
, the parametersθ
of the Gaussian process kernel and the noise varianceσ²
. If you are not familiar with the GPLVM, but know about Gaussian process, then this resembles very strongly the negative marginal log-likelihood of a regular Gaussian process regression model. - function
unpack_gplvm
is an auxiliary function that converts the vector of free, unconstrained parametersp
to the parametersX
,θ
andσ²
.
using DifferentiationInterface
using Distances
using Distributions
using JLD2
using LinearAlgebra
import Mooncake
using Optim
using Random
"""
Y are the D×N high-dimensional data points
iterations is the number of iterations of the optimisation algorithm
Q is the dimensionality of the latent space
"""
function gplvm(Y; iterations = 1, Q = 2)
# Get number of data items
D, N = size(Y)
# Allocate once zero vector necessary for marginal likelihood
# calculation of zero-mean Gaussian process
zerovector = zeros(N)
# Initialise parameters randomly:
# first Q*N elements are the N latent Q-dimensional projections X
# next 2 elements are kernel parameters - take log here because unpack function uses exp to ensure positivity
# last parameter is the noise variance - take log here because unpack function uses exp to ensure positivity
rng = MersenneTwister(1234)
initialsol = [randn(rng, Q*N)*0.1; log(1.0); log(1.0); log(1.0)]
# pre-allocate N×N covariance matrix K
K = zeros(N, N)
# setup optimiser options
opt = Optim.Options(iterations = iterations, show_trace = true, show_every = 1)
# use DifferentiationInterface to get gradients
# Comment in lines below to use Mooncake and comment out following block that uses Enzyme
backend = AutoMooncake(config = nothing)
prep = prepare_gradient(negativemarginallikelihood, backend, initialsol, Constant(Y), Constant(zerovector), Cache(K), Constant(D), Constant(Q), Constant(N))
gradhelper!(grad, p) = DifferentiationInterface.gradient!(negativemarginallikelihood, grad, prep, backend, p, Constant(Y), Constant(zerovector), Cache(K), Constant(D), Constant(Q), Constant(N))
helper(p) = negativemarginallikelihood(p, Y, zerovector, K, D, Q, N)
# Comment in lines below to use Enzyme and comment out above block that uses Mooncake
# backend = AutoEnzyme()
# helper(p) = negativemarginallikelihood(p, Y, zerovector, K, D, Q, N)
# prep = prepare_gradient(helper, backend, initialsol)
# gradhelper!(grad, p) = DifferentiationInterface.gradient!(helper, grad, prep, backend, p)
# call actual optimisation
finalsolution = optimize(helper, gradhelper!, initialsol, ConjugateGradient(), opt).minimizer
# obtain optimised latent
X = unpack_gplvm(finalsolution, Q, N)[1]
# return projections
return X
end
# Negative marginal likelihood function of GPLVM.
# We want to minimise this.
function negativemarginallikelihood(p, Y, zerovector, K, D, Q, N)
# extract parameters from vector p
X, θ, σ² = unpack_gplvm(p, Q, N)
# calculate pairwise squared Euclidean distances.
# Obviously, a more efficient implementation is possible.
for n in 1:N
for m in 1:N
@views K[n, m] = sum((X[:, n] - X[:, m]).^2)
end
end
# ovewrite K entries with covariance matrix elements
for n in eachindex(K)
K[n] = θ[1] * exp(-0.5 * K[n] / θ[2])
end
# add jitter on diagonal
for n in 1:N
K[n, n] += 1e-6
end
# accummulate here log likelihood over D dimensions
accloglikel = zero(eltype(p))
# instiantiate multivariate normal distribution
mvn = MvNormal(zerovector, K + σ²*I)
# iterate over D dimensions
for d in 1:D
# calculate log likelihood of d-th dimension
accloglikel += @views logpdf(mvn, Y[d, :])
end
# return negative log marginal likelihood
-1.0 * accloglikel
end
# Given parameters flattened in p, unpack them into X, θ and σ²
function unpack_gplvm(p, Q, N)
MARK = 0
# First Q*N elements are the N latent Q-dimensional projections X
X = reshape(p[MARK+1:MARK+Q*N], Q, N); MARK += Q*N
# Next two elements are kernel parameters
θ = exp.(p[MARK+1:MARK+2]); MARK += 2
# The last parameter is the noise variance
σ² = exp(p[MARK+1]); MARK += 1
return X, θ, σ²
end
To execute the code and observe the high memory consumption, we can simply call it with randomly generated data:
Y = randn(12, 1000) # 1000 data items with 12 features
X = gplvm(Y; iterations = 1) # warmup
X = gplvm(Y; iterations = 100) # note memory consumption during execution
If I run this:
Y = randn(12, 10_000) # 10000 data items with 12 features
X = gplvm(Y; iterations = 100)
the system will kill the julia session.
I strongly believe the that high memory consumption is related to the use of DifferentiationInterface.jl and Mooncake.jl. I thought that the use of contexts would help, but unfortunately it didn’t help. I have gone through the documentation, but I can’t see if I am missing something. Does anyone have any advice on how I could reduce my memory footprint? Thanks for tolerating this very long question.
(For what it’s worth, I have made the above code available here GitHub - ngiann/GPLVM.jl).
Perhaps useful: I am using Julia Version 1.11.4 with Ubuntu 22.04.5 LTS
Update: updated code so that use of Enzyme can be commented in and Mooncake can be commented out. This is in response to comment below.
Update: introduce new function that uses gradient-free NelderMead
optimiser:
"""
Same as gplvm but uses the gradient-free NelderMead optimiser.
"""
function gplvm_gradient_free(Y; iterations = 1, Q = 2)
# Get number of data items
D, N = size(Y)
# Allocate once zero vector necessary for marginal likelihood
# calculation of zero-mean Gaussian process
zerovector = zeros(N)
# Initialise parameters randomly:
# first Q*N elements are the N latent Q-dimensional projections X
# next 2 elements are kernel parameters - take log here because unpack function uses exp to ensure positivity
# last parameter is the noise variance - take log here because unpack function uses exp to ensure positivity
rng = MersenneTwister(1234)
initialsol = [randn(rng, Q*N)*0.1; log(1.0); log(1.0); log(1.0)]
# pre-allocate N×N covariance matrix K
K = zeros(N, N)
# setup optimiser options
opt = Optim.Options(iterations = iterations, show_trace = true, show_every = 1)
# objective function to be optimised
helper(p) = negativemarginallikelihood(p, Y, zerovector, K, D, Q, N)
# call actual optimisation
finalsolution = optimize(helper, initialsol, NelderMead(), opt).minimizer
# obtain optimised latent
X = unpack_gplvm(finalsolution, Q, N)[1]
# return projections
return X
end