I’m writing a function to fit a spatio-temporal model for clustering. At each iteration it has to compute spatial cohesions on subsets of units, and I saw that this takes a lot of the execution time, which is spent in garbage collection. Or at least this is what I interpreted from this flame graph, which represents a “medium size” fit with 30k iterations on 60 units and 12 time instants.
So i was wondering which was the tidiest way to improve this function execution, for example preallocating in some ways all the variables used in the computation (sdim, sbar1, etc) to recycle their use at every call of the function.
I think I could just define them before the iteration loop and pass them as arguments to the function, but I thought that there could be a better and tidier approach to do that (as I found for example here which however was in the case of vectors).
Here is the portion of code I was working one:
using Statistics
using SpecialFunctions
const logpi = log(π)
function G2a(a::Real, lg::Bool)
out = logpi + lgamma(a) + lgamma(a - 0.5)
return lg ? out : exp(out)
end
function cohesion3(s1::AbstractVector{Float64}, s2::AbstractVector{Float64}, mu_0::AbstractVector{Float64}, k0::Real, v0::Real, Psi::Matrix{Float64}; lg::Bool, M::Real=1.0)::Float64
sdim = length(s1)
# Compute sample means
sbar1 = mean(s1)
sbar2 = mean(s2)
# Compute deviations from the sample mean
S1, S2, S3, S4 = 0.0, 0.0, 0.0, 0.0
@inbounds for i in 1:sdim
s_sbar1 = s1[i] - sbar1
s_sbar2 = s2[i] - sbar2
S1 += s_sbar1 * s_sbar1
S4 += s_sbar2 * s_sbar2
S2 += s_sbar1 * s_sbar2
end
S3 = copy(S2) # to avoid repeating computations
# Updated parameters for cohesion 3
kn = k0 + sdim
vn = v0 + sdim
auxvec1_1 = sbar1 - mu_0[1]
auxvec1_2 = sbar2 - mu_0[2]
auxmat1_1 = auxvec1_1^2
auxmat1_2 = auxvec1_1 * auxvec1_2
auxmat1_3 = copy(auxmat1_2)
auxmat1_4 = auxvec1_2^2
auxconst1 = k0 * sdim
auxconst2 = k0 + sdim
Psi_n_1 = Psi[1] + S1 + auxconst1 / (auxconst2) * auxmat1_1
Psi_n_2 = Psi[2] + S2 + auxconst1 / (auxconst2) * auxmat1_2
Psi_n_3 = Psi[3] + S3 + auxconst1 / (auxconst2) * auxmat1_3
Psi_n_4 = Psi[4] + S4 + auxconst1 / (auxconst2) * auxmat1_4
detPsi_n = Psi_n_1 * Psi_n_4 - Psi_n_2 * Psi_n_3
detPsi = Psi[1] * Psi[4] - Psi[2] * Psi[3]
out = -sdim * logpi + G2a(0.5 * vn, true) - G2a(0.5 * v0, true) + 0.5 * v0 * log(detPsi) - 0.5 * vn * log(detPsi_n) + log(k0) - log(kn)
return lg ? out : exp(out)
end
# simple test example
n = 20
s1 = rand(n)
s2 = rand(n)
mu_0 = [1.,2.]
k0 = 0.5
v0 = 0.5
Psi = [1. 0.2; 0.2 2.]
@timev cohesion3(s1, s2, mu_0, k0, v0, Psi, lg=false)
