I have run into the same problem that others have run into before me. Unfortunately, I can’t solve the problem despite the fact that the question has been asked in (slightly) different guises (e.g. link1, link2).
The problem: I have a function that for efficiency pre-allocates a buffer. The function works as it should. However, I cannot auto-differentiate it easily. I discovered PreallocationTools.jl, but evidently I am not using it correctly: when I attempt to use it, too many memory allocations are made.
Side note: I plan to use ForwarfDiff.jl for the autodifferentiation, though reverse differentiation would probably make more sense for my case.
I have created a MWE example below that includes two functions:
- function
setup_loglikelihood
sets up and returns a function that evaluates a mock log-likelihood. This mock log-likelihood function would suffice if we are only interested in evaluating the log-likelihood, but not auto-differentiating it. - function
setup_loglikelihood_tool
sets up and returns a function that evaluates a mock log-likelihood. It attempts to use PreallocationTools.jl, but something seems to be terribly wrong given its poor performance.
using LinearAlgebra, ForwardDiff, PreallocationTools
# This function does not make any considerations for auto-differentiation
function setup_loglikelihood(X, Y)
# X are the Q × N inputs
Q, N = size(X)
# Y are D × N targets/outputs
D = size(Y, 1)
# pre-allocate array for predictions we get from W*X
# where W are the parameters of the model of size D×Q
pred_storage = zeros(D, N)
function loglikelihood(param)
W = reshape(param, D, Q)
mul!(pred_storage, W, X)
-sum(abs2.(Y - pred_storage))
end
return loglikelihood
end
# This function uses a cache created with PreallocationTools.jl, but does something wrong
function setup_loglikelihood_tool(X, Y)
# X are the Q × N inputs
Q, N = size(X)
# Y are D × N targets/outputs
D = size(Y, 1)
function loglikelihood!(pred_storage, param)
pred_storage = get_tmp(pred_storage, param)
W = reshape(param, D, Q)
mul!(pred_storage, W, X)
-sum(abs2.(Y - pred_storage))
end
return loglikelihood!
end
Using the code above, I try the following:
# create mock data and parameter
X = randn(4,10) # mock inputs
Y = randn(5, 10) # mock outputs
W = randn(4,5) # mock parameters
# get log-likelihood function that does not account for autodifferentiation
logl = setup_loglikelihood(X, Y)
logl(vec(W)) # evaluates with no problems
ForwardDiff.gradient(logl, vec(W)) # this throws error as expected
# get log-likelihood function that attempts to account for autodifferentiation
logl! = setup_loglikelihood_tool(X, Y)
C = DiffCache(similar(Y)) # create cache
logl!(C, vec(W)) # returns same result as logl(vec(W)) above
# takes **very** long and produces **far too** many memory allocations
@time ForwardDiff.gradient(w -> logl!(C, w), vec(W))
Evidently, I am not using PreallocationTools.jl correctly. What am I doing wrong?