I’m writing a function to fit a spatio-temporal model for clustering. At each iteration it has to compute spatial cohesions on subsets of units, and I saw that this takes a lot of the execution time, which is spent in garbage collection. Or at least this is what I interpreted from this flame graph, which represents a “medium size” fit with 30k iterations on 60 units and 12 time instants.
So i was wondering which was the tidiest way to improve this function execution, for example preallocating in some ways all the variables used in the computation (sdim
, sbar1
, etc) to recycle their use at every call of the function.
I think I could just define them before the iteration loop and pass them as arguments to the function, but I thought that there could be a better and tidier approach to do that (as I found for example here which however was in the case of vectors).
Here is the portion of code I was working one:
using Statistics
using SpecialFunctions
const logpi = log(π)
function G2a(a::Real, lg::Bool)
out = logpi + lgamma(a) + lgamma(a - 0.5)
return lg ? out : exp(out)
end
function cohesion3(s1::AbstractVector{Float64}, s2::AbstractVector{Float64}, mu_0::AbstractVector{Float64}, k0::Real, v0::Real, Psi::Matrix{Float64}; lg::Bool, M::Real=1.0)::Float64
sdim = length(s1)
# Compute sample means
sbar1 = mean(s1)
sbar2 = mean(s2)
# Compute deviations from the sample mean
S1, S2, S3, S4 = 0.0, 0.0, 0.0, 0.0
@inbounds for i in 1:sdim
s_sbar1 = s1[i] - sbar1
s_sbar2 = s2[i] - sbar2
S1 += s_sbar1 * s_sbar1
S4 += s_sbar2 * s_sbar2
S2 += s_sbar1 * s_sbar2
end
S3 = copy(S2) # to avoid repeating computations
# Updated parameters for cohesion 3
kn = k0 + sdim
vn = v0 + sdim
auxvec1_1 = sbar1 - mu_0[1]
auxvec1_2 = sbar2 - mu_0[2]
auxmat1_1 = auxvec1_1^2
auxmat1_2 = auxvec1_1 * auxvec1_2
auxmat1_3 = copy(auxmat1_2)
auxmat1_4 = auxvec1_2^2
auxconst1 = k0 * sdim
auxconst2 = k0 + sdim
Psi_n_1 = Psi[1] + S1 + auxconst1 / (auxconst2) * auxmat1_1
Psi_n_2 = Psi[2] + S2 + auxconst1 / (auxconst2) * auxmat1_2
Psi_n_3 = Psi[3] + S3 + auxconst1 / (auxconst2) * auxmat1_3
Psi_n_4 = Psi[4] + S4 + auxconst1 / (auxconst2) * auxmat1_4
detPsi_n = Psi_n_1 * Psi_n_4 - Psi_n_2 * Psi_n_3
detPsi = Psi[1] * Psi[4] - Psi[2] * Psi[3]
out = -sdim * logpi + G2a(0.5 * vn, true) - G2a(0.5 * v0, true) + 0.5 * v0 * log(detPsi) - 0.5 * vn * log(detPsi_n) + log(k0) - log(kn)
return lg ? out : exp(out)
end
# simple test example
n = 20
s1 = rand(n)
s2 = rand(n)
mu_0 = [1.,2.]
k0 = 0.5
v0 = 0.5
Psi = [1. 0.2; 0.2 2.]
@timev cohesion3(s1, s2, mu_0, k0, v0, Psi, lg=false)