Hello Guys!
This question is about a function that I will post below but it is also about a more generic question which is … How can we efficiently deal with a function f
that is called many times and every function call involves allocating some arrays (that are needed in order to get the result of the function). I am assuming that the dimension of the arrays is fixed so at every call of f
the size of all the auxiliar arrays is known.
I decided to generate a type
( mutable struct
now) called opt
that has a placeholder for all the arrays that are needed for the function f
.
Every time I call f
I modify “inplance” all the arrays of opt
instead of generating them and erasing them inside the function.
Nevertheless the function seems to allocate a lot of memory every time it is called and I can’t see why.
The function that I benchmark is
function partial_fit!(rbm::RBM, X::Matrix, lr::Real, opt::CDK)
compute_grad!(rbm, X, opt)
update_params!(rbm, opt, lr)
end
All the real computation (and allocation) comes from
function compute_grad!(rbm::RBM, X::Matrix, opt::CDK)
T = eltype(rbm.vis_bias)
batch_size = size(X)[2]
# Perform gibbs sampling to compute the negative phase
for k in 1:opt.K
if k==1
opt.H .= sigmoid.(rbm.W * X .+ rbm.hid_bias)
opt.V_hat .= sigmoid.(rbm.W'* opt.H .+ rbm.vis_bias) .> rand(T,rbm.n_vis, batch_size)
opt.H_hat .= sigmoid.(rbm.W * opt.V_hat .+ rbm.hid_bias)
else
opt.V_hat .= sigmoid.(rbm.W'* opt.H_hat .+ rbm.vis_bias) .> rand(T,rbm.n_vis, batch_size)
opt.H_hat .= sigmoid.(rbm.W * opt.V_hat .+ rbm.hid_bias)
end
end
opt.grad_W = (opt.H * X' .- opt.H_hat * opt.V_hat')./ batch_size;
opt.grad_vis_bias = vec(sum((X .- opt.V_hat), 2))./ batch_size;
opt.grad_hid_bias = vec(sum((opt.H .- opt.H_hat), 2))./ batch_size;
opt.rec_error = sqrt(sum((X.-opt.V_hat).^2))
end
The results of the benchmark are:
BenchmarkTools.Trial:
memory estimate: 10.81 MiB
allocs estimate: 103
--------------
minimum time: 16.863 ms (0.00% GC)
median time: 18.426 ms (0.00% GC)
mean time: 18.836 ms (5.53% GC)
maximum time: 23.076 ms (9.37% GC)
--------------
samples: 265
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
In order to try to improve the code I decided to rewrite compute_grad
. I wrote compute_grad_with_dot!
where everytime I had an array A = somethig
I rewrote it as A .= somethig
, allocating all the needed memory inside opt
function partial_fit_with_dot!(rbm::RBM, X::Matrix, lr::Real, opt::CDK)
compute_grad_with_dot!(rbm, X, opt)
update_params!(rbm, opt, lr)
end
function compute_grad_with_dot!(rbm::RBM, X::Matrix, opt::CDK)
T = eltype(rbm.vis_bias)
batch_size = size(X)[2]
# Perform gibbs sampling to compute the negative phase
for k in 1:opt.K
opt.V_sampling .= rand(T, rbm.n_vis, batch_size)
if k==1
opt.H .= sigmoid.(rbm.W * X .+ rbm.hid_bias)
opt.V_hat .= sigmoid.(rbm.W'* opt.H .+ rbm.vis_bias) .> opt.V_sampling
opt.H_hat .= sigmoid.(rbm.W * opt.V_hat .+ rbm.hid_bias)
else
opt.V_hat .= sigmoid.(rbm.W'* opt.H_hat .+ rbm.vis_bias) .> opt.V_sampling
opt.H_hat .= sigmoid.(rbm.W * opt.V_hat .+ rbm.hid_bias)
end
end
opt.grad_W .= (opt.H * X' .- opt.H_hat * opt.V_hat')./ batch_size;
opt.grad_vis_bias .= vec(sum((X .- opt.V_hat), 2))./ batch_size;
opt.grad_hid_bias .= vec(sum((opt.H .- opt.H_hat), 2))./ batch_size;
opt.rec_error = sqrt(sum((X .- opt.V_hat).^2))
end
The results are pretty much to the ones before in terms of memory.
@benchmark partial_fit_with_dot!(rbm, X_train[:,1:500], 0.1, cdk)
BenchmarkTools.Trial:
memory estimate: 10.75 MiB
allocs estimate: 87
--------------
minimum time: 17.265 ms (0.00% GC)
median time: 20.427 ms (0.00% GC)
mean time: 20.815 ms (5.42% GC)
maximum time: 37.003 ms (0.00% GC)
--------------
samples: 239
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
The full code can be found here:
https://github.com/davidbp/learn_julia/blob/master/RBM/RBM_test_MNIST_tunned_separate_opt.ipynb
What is the most effective way to deal with this situations?
Thank you for your time guys.