Same code run multiple times gives wildly different timings

I have some code that heavily uses @tturbo from LoopVectorization for matmul-like operations. The main idea here is that I use Julia’s dispatch to pass settings to the algorithm. For example, I call fit!(option), and this calls multiple internal functions and passes the option along. Finally, these small functions are specialized to concrete types of option, so I have functions like this:

finalize_posteriors!(G, norm::AV, reg::Nothing)
finalize_posteriors!(G, norm::AV, reg::InvGamma)

ELBO(G, p, mu, var, data, reg::Nothing)
ELBO(G, p, mu, var, data, reg::InvGamma)

…where reg is that option. These pairs of functions are extremely similar, they’re basically the same functions.

Problem

For some reason, the code called with reg = InvGamma(...) can be very slow. I run the same file multiple times:

julia --threads auto compute.jl

The file first runs the code with reg = nothing and then with reg = InvGamma(...).

  • The first part very consistently runs in about 25-28 seconds
  • The second part behaves erratically. Here are some timings in min:sec format: 00:33, 00:43, 02:16, 01:28, 00:34, 02:34, 00:41, 00:32, 00:32, 01:04, 01:42
    • What’s up with these insane 1min+ timings? The first part code runs in about 00:25 in all of these runs.
    • All 4 CPU cores are being used in all of these runs

Needless to say, I’m running both parts of the code with the same data, same settings, same everything except for the reg parameter.

Code

The only part of code the reg parameter really affects is this:

finalize_variances!(var::AV, norm::AV, reg::Nothing) = @turbo var .= var ./ norm .+ 1e-6

function finalize_variances!(var::AV, norm::AV, reg::InvGamma)
    Ξ±, Ξ² = reg.Ξ±, reg.Ξ²
    @turbo @. var = (2Ξ² + var) / (2Ξ± + norm)
    nothing
end

All other functions that dispatch on the reg parameter look like this:

M_weights!(p, G, norm, reg::Nothing) = p .= norm ./ sum(norm)
@inline M_weights!(p, G, norm, reg::InvGamma) = M_weights!(p, G, norm, nothing)

In other words, the reg::Nothing method does the actual work, while the reg::InvGamma one simply calls the reg::Nothing version.

The full code can be found here: https://gist.github.com/ForceBru/8e7f6940c3013e15f62a6c660a618c85. You can simply copy-paste and run it: it’ll automatically install all packages.

Question

What could be the issue here? Why could the reg = InvGamma(...) version be randomly much slower than the reg = nothing version?

In general, using this pattern seems slow:

abstract type AbstractParam end
const MaybeParam = Union{AbstractParam, Nothing}

struct MyParam <: AbstractParam opt::Real end

# Library-specific API
f1(x, par::Nothing) = 0.0
f1(x, par::MyParam) = par.opt

# User-facing API dispatches on lib-specific API
f(x, par::MaybeParam) = 5 + f1(x, par)

Now benchmark it like this:

using BenchmarkTools

some_param = MyParam(100.0)

@info "Compile"
@show f(1, nothing)
@show f(1, some_param)

@info "Benchmark"
display(@benchmark $f(1, nothing))
println()
display(@benchmark $f(1, $some_param))

I made sure to interpolate everything when using @benchmark, so it shouldn’t be measuring access to global scope. The measurements look like this:

~/test $ julia-1.7 struct_access.jl                                      (base) 
[ Info: Compile
f(1, nothing) = 5.0
f(1, some_param) = 105.0
[ Info: Benchmark
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.040 ns … 9.050 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     0.045 ns             β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   0.047 ns Β± 0.090 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

          ▁   β–‚   ▁   β–†   β–ˆ                                  
  β–‚β–β–β–β–…β–β–β–β–ˆβ–β–β–β–ˆβ–β–β–β–ˆβ–β–β–β–ˆβ–β–β–β–ˆβ–β–β–β–ƒβ–β–β–β–„β–β–β–β–ˆβ–β–β–β–‚β–β–β–β–‚β–β–β–β–ƒβ–β–β–β–…β–β–β–β–ƒ β–ƒ
  0.04 ns        Histogram: frequency by time      0.054 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  23.579 ns …  1.058 ΞΌs  β”Š GC (min … max): 0.00% … 96.82%
 Time  (median):     24.962 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   26.913 ns Β± 15.875 ns  β”Š GC (mean Β± Οƒ):  0.97% Β±  1.68%

  β–„β–ˆβ–…β–‡β–„β–‚ β–ƒβ–‡ ▃▆▁ β–‚      ▃▃▁▁▁    ▁▂▂                           β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–ˆβ–†β–†β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–‡β–†β–„β–…β–…β–ƒβ–‚β–ƒβ–‚β–…β–…β–„β–…β–„β–„β–…β–ƒβ–…β–„β–…β–… β–ˆ
  23.6 ns      Histogram: log(frequency) by time      43.6 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.                           ~/test $   

Apparently, f(1, some_param) allocates memory and is at least 23.579 / 9.050 = 2.605 times slower than f(1, nothing) (best case of f(1, some_param) compared to worst case of f(1, nothing)).

Benchmarking attribute access like @benchmark $some_param.opt gives Time (mean Β± Οƒ): 2.045 ns Β± 0.455 ns, the same as the timings for variable access like @benchmark $some_param. So, these 20+ nanoseconds I’m measuring here aren’t just attribute access times.


EDIT 1: looks like there’s type instability in the call f(1, some_param):

julia> @code_warntype f(1, some_param)
MethodInstance for f(::Int64, ::MyParam)
  from f(x, par::MaybeParam) in Main at test/struct_access.jl:13
Arguments
  #self#::Core.Const(f)
  x::Int64
  par::MyParam
Body::Any
1 ─ %1 = Main.f1(x, par)::Real
β”‚   %2 = (5 + %1)::Any
└──      return %2

Not sure why this is the case: %1 is Real, the literal 5 is also Real, then why is (5 + %1)::Any??

The problem is very likely that Real is an abstract type, and you use it here:

struct InvGamma <: AbstractRegularization
	Ξ±::Real
	Ξ²::Real
end

Use, instead:

struct InvGamma{T<:Real} <: AbstractRegularization
	Ξ±::T
	Ξ²::T
end
5 Likes

I think I’ve just noticed this by examining the output of @code_warntype. Let me test whether this is the case…

EDIT: wow, immediate speedup (from mean 26ns to mean 2.3ns) and zero allocations after using the generic type like this:

struct MyParam{T<:Real} <: AbstractParam
    opt::T
end

Well, I’ll go fix a whoooole bunch of my code using this now, thank you!

3 Likes

Note that with the pattern above you can’t have parameters with different types. Maybe in some cases you need to use something more generic, as:

julia> struct A{T1<:Real,T2<:Real}
           Ξ±::T1
           Ξ²::T2
       end

julia> A(1,1.0)
A{Int64, Float64}(1, 1.0)

so you can initialize each parameter with a different type of Real.

Unfortunately, that doesn’t seem to change much in the original code (the one in my post, as opposed to the comment)…

I’m still getting random slowdowns in the part with InvGamma, even after changing the struct to use generics. I just got one more such result: 00:52 with nothing as parameter (I attempted to load a page in the browser at the same time which biased the timings) and 02:16 for the InvGamma method (the page has already loaded when this benchmark started; this is still slower than with loading a webpage in the background!). Ran it once again - got 27 seconds and 26 seconds (which is what the timings should look like, IMO). Ran once more and got 00:29 and 01:54.

I’m using ProgressMeter to track progress, and for the first part of the code (with nothing as parameter) the progress bar shows consistent fast progress, while for the second part (with InvGamma) the progress starts off fast but then slows to a creep and stays that way, and the ETA starts steadily increasing instead of decreasing.

This:

ELBO_hist::Vector{<:Real}

May cause the same kind of issue.

I cannot test the code now, but you probably have one issue of this kind or some non constant global somewhere causing type instabilities.

I was (and still am) very confused by the varying benchmark results for Real. Tested by replacing all Real with Float64

import Pkg
Pkg.activate(temp=true, io=devnull)
V_OLD, V_NEW = "0.12.99", "0.12.102"
Pkg.add(name="LoopVectorization", version=V_NEW, io=devnull)
Pkg.add(["Distributions", "ProgressMeter", "BenchmarkTools"], io=devnull)
Pkg.status()

module EM

import Distributions: Dirichlet
using LoopVectorization

const AV = AbstractVector{T} where T
const AM = AbstractMatrix{T} where T

abstract type AbstractRegularization end
const MaybeReg = Union{AbstractRegularization, Nothing}

struct InvGamma <: AbstractRegularization
	Ξ±::Float64
	Ξ²::Float64
end

function init_G(K, N, a::Float64)
	distr = Dirichlet(K, a)
	rand(distr, N)
end

normal(x::Float64, ΞΌ::Float64, var::Float64) = exp(-(x-ΞΌ)^2 / (2var)) / sqrt(2Ο€ * var)

function ELBO(G, p, mu, var, data, reg::Nothing)::Float64
	K, N = size(G)

	ret = 0.0
	@tturbo for n ∈ 1:N, k ∈ 1:K
        q = G[k, n]
		ret += q * (
			log(p[k]) - (log(2Ο€) + log(var[k]) + (data[n] - mu[k])^2 / var[k]) / 2
			- log(q + 1e-100)
		)
	end
	ret
end

@inline ELBO(G, p, mu, var, data, reg::InvGamma) = ELBO(G, p, mu, var, data, nothing)

# ===== E step =====
function finalize_posteriors!(G, norm::AV, reg::Nothing)
	K, N = size(G)
	@tturbo for n ∈ 1:N, k ∈ 1:K
        G[k, n] /= norm[n]
	end
end

@inline finalize_posteriors!(G, norm::AV, reg::InvGamma) =
	finalize_posteriors!(G, norm, nothing)

function step_E!(G, norm::AV{<:Float64}, p, mu, var, data, reg::MaybeReg)
	K, N = size(G)
	@assert length(norm) == N
	norm .= 0
    @tturbo for n ∈ 1:N, k ∈ 1:K
        G[k, n] = p[k] * exp(-(data[n] - mu[k])^2 / (2var[k])) / sqrt(2Ο€ * var[k])
		norm[n] += G[k, n]
	end

	finalize_posteriors!(G, norm, reg)
end

# ===== M step =====
M_weights!(p, G, norm, reg::Nothing) = p .= norm ./ sum(norm)
@inline M_weights!(p, G, norm, reg::InvGamma) = M_weights!(p, G, norm, nothing)

function M_means!(mu, G, norm, data, reg::Nothing)
	K, N = size(G)

	mu .= 0
	@tturbo for n ∈ 1:N,k ∈ 1:K
        mu[k] += G[k, n] / norm[k] * data[n]
	end
end

@inline M_means!(mu, G, norm, data, reg::InvGamma) =
	M_means!(mu, G, norm, data, nothing)

finalize_variances!(var::AV, norm::AV, reg::Nothing) = @turbo var .= var ./ norm .+ 1e-6

function finalize_variances!(var::AV, norm::AV, reg::InvGamma)
    Ξ±, Ξ² = reg.Ξ±, reg.Ξ²
	@turbo @. var = (2Ξ² + var) / (2Ξ± + norm)
	nothing
end

function M_variances!(var, G, norm, data, mu, reg::MaybeReg)
	K, N = size(G)

	var .= 0
	@tturbo for n ∈ 1:N, k ∈ 1:K
        var[k] += G[k, n] * (data[n] - mu[k])^2
	end

	finalize_variances!(var, norm, reg)
end

function step_M!(G, norm, p, mu, var, data, reg::MaybeReg)
	K, N = size(G)
	@assert K < N
	evidences = @view norm[1:K]

	evidences .= 0
    @tturbo for n ∈ 1:N, k ∈ 1:K
        evidences[k] += G[k, n]
	end

	M_weights!(p, G, evidences, reg)
	M_means!(mu, G, evidences, data, reg)
	M_variances!(var, G, evidences, data, mu, reg)
end

end # module

using ProgressMeter, Random, BenchmarkTools # , Revise
import Distributions: UnivariateGMM, Categorical
const AV = AbstractVector{T} where T
const AM = AbstractMatrix{T} where T


function sample_GMM(p::AV, mu::AV, var::AV; N::Integer)
	distr = UnivariateGMM(mu, sqrt.(var), Categorical(p))
	rand(distr, N)
end

mutable struct GaussianMixture{T<:Float64}
	K::Integer

	G::Union{Matrix{T}, Nothing}
	norm::Union{Vector{T}, Nothing}

	p::Vector{T}
	mu::Vector{T}
	var::Vector{T}

	ELBO_hist::Vector{<:Float64}
end

function GaussianMixture{T}(K::Integer) where T<:Float64
	@assert K > 0

	GaussianMixture{T}(
		K, nothing, nothing,
		zeros(T, K), zeros(T, K), zeros(T, K),
		Float64[]
	)
end

@inline GaussianMixture(K::Integer) = GaussianMixture{Float64}(K)

function fit!(
	gmm::GaussianMixture{T}, data::AV{<:Float64};
	maxiter::Integer=10_000,
	reg::EM.MaybeReg=nothing, a::Float64=100., reinit::Bool=false
) where T<:Float64
	K, N = gmm.K, length(data)
	has_reinit = false
	if gmm.G === nothing || size(gmm.G) β‰  (K, N) || reinit
		gmm.G = EM.init_G(K, N, a)
		gmm.norm = zeros(T, N)
		has_reinit = true
	end
	@assert size(gmm.G) == (K, N)

	if has_reinit
		EM.step_M!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)
	else
		# Keep old parameters
		EM.step_E!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)
	end

	gmm.ELBO_hist = zeros(T, maxiter)
	for i ∈ 1:maxiter
		EM.step_E!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)
		EM.step_M!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)

		gmm.ELBO_hist[i] = EM.ELBO(gmm.G, gmm.p, gmm.mu, gmm.var, data, reg)
	end

	(; p=gmm.p, mu=gmm.mu, var=gmm.var)
end

function fit_rolling!(
    gmm::GaussianMixture{T}, data::AV{<:Float64}, win_size::Integer;
    kwargs...
) where T<:Float64
    the_range = win_size:length(data)
    P = Matrix{T}(undef, gmm.K, length(the_range))
    M, V = similar(P), similar(P)
    @showprogress for (i, off) ∈ enumerate(the_range)
		win = @view data[off-the_range[1]+1:off]
		p, mu, var = fit!(gmm, win; kwargs...)
		P[:, i] .= p
		M[:, i] .= mu
		V[:, i] .= var
	end
	P, M, V
end

@info "Generating data..."
const data = sample_GMM([.6, .4], [0, 0], [.3, .2]; N=500)

const K, WIN = 2, 300
@info "Parameters" K WIN

@info "Fitting... (no regularization)"
@benchmark fit_rolling!(mix, $data, $WIN) setup=(Random.seed!(85); mix = GaussianMixture(K)) 

@info "Fitting... (with regularization)"
@benchmark fit_rolling!(mix, $data, $WIN; reg=reg) setup=(Random.seed!(85); mix = GaussianMixture(K); reg = EM.InvGamma(.1, .1))

I see

which looks good to me.

Real is abstract, so the variables will be boxed, allocate stuff, call GC, etc. Float64 is concrete.

However, the ELBO_hist::Vector{<:Real} part is shared between the two methods (with nothing and with InvGamma). It should thus be affecting both methods, but it only ever affects the InvGamma one. I changed it to ELBO_hist::Vector{E}, where E comes from mutable struct GaussianMixture{T<:Real, E<:Real}, but it didn’t change anything: I just got another 00:26 vs 02:33 result. The previous run (with the exact same code) was 00:28 vs 00:24. The code doesn’t change, but the timings change drastically… I also removed the ::Real return type annotation from ELBO, but this too changed nothing.

Post the latest version here, so people can check

Latest version: https://gist.github.com/ForceBru/8e7f6940c3013e15f62a6c660a618c85#file-version_1-jl

I wanted to say that I changed every single occurrence of Real to Float64 and haven’t seen any slowdowns since, but I just got two pairs of results like (00:27 vs 01:05) and (00:26 vs 00:57), so the issue is still here…

I ran your last version, with and without ProgressMeter, and I have also seen the issue.

julia> run()
[ Info: Generating data...
β”Œ Info: Parameters
β”‚   K = 2
β””   WIN = 300
[ Info: Fitting... (no regularization)
 14.594674 seconds (1.21 k allocations: 15.386 MiB)
[ Info: Fitting... (with regularization)
 14.872701 seconds (1.21 k allocations: 15.386 MiB, 0.03% gc time)
([0.19056693095854077 0.18744056756284755 … 0.0037090872473333693 0.003739761735901768; 0.8094330690414592 0.8125594324371523 … 0.9962909127526666 0.9962602382640982], [-0.3449061202525738 -0.3621885005206003 … 1.8250227802140728 1.825179465909706; 0.06946818114343493 0.07443589954563182 … 7.373542843016062e-5 0.005900212958603517], [0.24913434575216062 0.2452425954818671 … 0.2906502136162451 0.28939102550911766; 0.1787790293682965 0.1761821076225255 … 0.2589479157183338 0.25495666889234986])

julia> run()
[ Info: Generating data...
β”Œ Info: Parameters
β”‚   K = 2
β””   WIN = 300
[ Info: Fitting... (no regularization)
 14.723044 seconds (1.21 k allocations: 15.386 MiB)
[ Info: Fitting... (with regularization)
106.329024 seconds (1.21 k allocations: 15.386 MiB)
([0.5173114601980645 0.5540628677603205 … 1.0 1.0; 0.4826885398019354 0.44593713223967935 … 2.5e-323 2.5e-323], [-0.2146663966176833 -0
.19650003220418027 … -3.836599417869093e-5 -0.0022272304834010403; 0.17703132778788733 0.18088244207116588 … 0.06414018207707493 0.0670
8473703642075], [0.1889233689623695 0.19111090734234365 … 0.2686002482989721 0.26958119497234573; 0.23385040531048062 0.238753315025043
35 … 1.0 1.0])


I cannot see anything obviously faulty in the code, and there is no obvious type instability either.

I ran my latest in VS code 4 times without runaway. In native Julia however, first run

and second run

1 Like

I did some profiling with ProfileView and found that in one of the slow runs only one thread was executing code that I wrote. The other three were executing code in the ThreadingUtilities package.

In this screenshot the left chunk is my code (or something that I recognize as my code), whereas everything to the right comes from ThreadingUtilities:

Threads 2-4 only contain the part to the right (everything to the right of the leftmost red bar). Maybe this is by design, not sure.

Looks like it’s by design, since that code runs LoopVectorization.TURBO from LoopVectorization/tSQDi/src/codegen/lower_threads.jl:10 (the β€œtower” to the right). The two rightmost blocks are operators + and < from int.jl (Base Julia, I assume). Which means that about 50% of the time is spent in these operators???

1 Like

It is not easy to reproduce the issue, but it happens. I tested with JULIA_EXCLUSIVE=1, with no effect. I get sort of random slowdowns in some runs, as observed by the OP. I guess is GC kicking in from time to time, and if I could inspect with detail, I would track where allocations are happening to see if I find something.

I’m running the code below, with

include("forcebru.jl")
run()
code
import Pkg
Pkg.activate(temp=true, io=devnull)
V_OLD, V_NEW = "0.12.99", "0.12.102"
Pkg.add(name="LoopVectorization", version=V_NEW, io=devnull)
Pkg.add(["Distributions", "ProgressMeter"], io=devnull)
Pkg.status()

module EM

import Distributions: Dirichlet
using LoopVectorization

const AV = AbstractVector{T} where T
const AM = AbstractMatrix{T} where T

abstract type AbstractRegularization end
const MaybeReg = Union{AbstractRegularization, Nothing}

struct InvGamma{T1<:Real, T2<:Real} <: AbstractRegularization
	Ξ±::T1
	Ξ²::T2
end

function init_G(K, N, a::T) where T<:Real
	distr = Dirichlet(K, a)
	rand(distr, N)
end

function ELBO(G, p, mu, var, data, reg::Nothing)
	K, N = size(G)

	ret = 0.0
	@tturbo for n ∈ 1:N, k ∈ 1:K
		q = G[k, n]
		ret += q * (
			log(p[k]) - (log(2Ο€) + log(var[k]) + (data[n] - mu[k])^2 / var[k]) / 2
			- log(q + 1e-100)
		)
	end
	ret
end

@inline ELBO(G, p, mu, var, data, reg::InvGamma) = ELBO(G, p, mu, var, data, nothing)

# ===== E step =====
function finalize_posteriors!(G, norm::AV, reg::Nothing)
	K, N = size(G)
	@tturbo for n ∈ 1:N, k ∈ 1:K
        G[k, n] /= norm[n]
	end
end

@inline finalize_posteriors!(G, norm::AV, reg::InvGamma) =
	finalize_posteriors!(G, norm, nothing)

function step_E!(G, norm::AV{<:Real}, p, mu, var, data, reg::MaybeReg)
	K, N = size(G)
	@assert length(norm) == N
	norm .= 0
	@tturbo for n ∈ 1:N, k ∈ 1:K
		G[k, n] = p[k] * exp(-(data[n] - mu[k])^2 / (2var[k])) / sqrt(2Ο€ * var[k])
		norm[n] += G[k, n]
	end

	finalize_posteriors!(G, norm, reg)
end

# ===== M step =====
M_weights!(p, G, norm, reg::Nothing) = p .= norm ./ sum(norm)
@inline M_weights!(p, G, norm, reg::InvGamma) = M_weights!(p, G, norm, nothing)

function M_means!(mu, G, norm, data, reg::Nothing)
	K, N = size(G)

	mu .= 0
	@tturbo for n ∈ 1:N,k ∈ 1:K
    	mu[k] += G[k, n] / norm[k] * data[n]
	end
end

@inline M_means!(mu, G, norm, data, reg::InvGamma) =
	M_means!(mu, G, norm, data, nothing)

finalize_variances!(var::AV, norm::AV, reg::Nothing) = @turbo var .= var ./ norm .+ 1e-6

function finalize_variances!(var::AV, norm::AV, reg::InvGamma)
    Ξ±, Ξ² = reg.Ξ±, reg.Ξ²
	@turbo @. var = (2Ξ² + var) / (2Ξ± + norm)
	nothing
end

function M_variances!(var, G, norm, data, mu, reg::MaybeReg)
	K, N = size(G)

	var .= 0
	@tturbo for n ∈ 1:N, k ∈ 1:K
        var[k] += G[k, n] * (data[n] - mu[k])^2
	end

	finalize_variances!(var, norm, reg)
end

function step_M!(G, norm, p, mu, var, data, reg::MaybeReg)
	K, N = size(G)
	@assert K < N
	evidences = @view norm[1:K]

	evidences .= 0
	@tturbo for n ∈ 1:N, k ∈ 1:K
        evidences[k] += G[k, n]
	end

	M_weights!(p, G, evidences, reg)
	M_means!(mu, G, evidences, data, reg)
	M_variances!(var, G, evidences, data, mu, reg)
end

end # module

using ProgressMeter
import Distributions: UnivariateGMM, Categorical
const AV = AbstractVector{T} where T
const AM = AbstractMatrix{T} where T


function sample_GMM(p::AV, mu::AV, var::AV; N::Integer)
	distr = UnivariateGMM(mu, sqrt.(var), Categorical(p))
	rand(distr, N)
end

mutable struct GaussianMixture{T<:Real, E<:Real}
	K::Integer

	G::Union{Matrix{T}, Nothing}
	norm::Union{Vector{T}, Nothing}

	p::Vector{T}
	mu::Vector{T}
	var::Vector{T}

	ELBO_hist::Vector{E}
end

function GaussianMixture{T, E}(K::Integer) where {T<:Real, E<:Real}
	@assert K > 0

	GaussianMixture{T, E}(
		K, nothing, nothing,
		zeros(T, K), zeros(T, K), zeros(T, K),
		E[]
	)
end

@inline GaussianMixture(K::Integer) = GaussianMixture{Float64, Float64}(K)

function fit!(
	gmm::GaussianMixture{T}, data::AV{<:Real};
	maxiter::Integer=10_000,
	reg::EM.MaybeReg=nothing, a::T2=100, reinit::Bool=false
) where {T<:Real, T2<:Real}
	K, N = gmm.K, length(data)
	has_reinit = false
	if gmm.G === nothing || size(gmm.G) β‰  (K, N) || reinit
		gmm.G = EM.init_G(K, N, a)
		gmm.norm = zeros(T, N)
		has_reinit = true
	end
	@assert size(gmm.G) == (K, N)

	if has_reinit
		EM.step_M!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)
	else
		# Keep old parameters
		EM.step_E!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)
	end

	gmm.ELBO_hist = zeros(T, maxiter)
	for i ∈ 1:maxiter
		EM.step_E!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)
		EM.step_M!(gmm.G, gmm.norm, gmm.p, gmm.mu, gmm.var, data, reg)
		gmm.ELBO_hist[i] = EM.ELBO(gmm.G, gmm.p, gmm.mu, gmm.var, data, reg)
	end

	(; p=gmm.p, mu=gmm.mu, var=gmm.var)
end

function fit_rolling!(
    gmm::GaussianMixture{T}, data::AV{<:Real}, win_size::Integer;
    kwargs...
) where T<:Real
    the_range = win_size:length(data)
    P = Matrix{T}(undef, gmm.K, length(the_range))
    M, V = similar(P), similar(P)
    @showprogress for (i, off) ∈ enumerate(the_range)
		win = @view data[off-the_range[1]+1:off]
		p, mu, var = fit!(gmm, win; kwargs...)
		P[:, i] .= p
		M[:, i] .= mu
		V[:, i] .= var
	end
	P, M, V
end

# Actually fit shit
function run()
@info "Generating data..."
data = sample_GMM([.6, .4], [0, 0], [.3, .2]; N=500)

K, WIN = 2, 300
@info "Parameters" K WIN

@info "Fitting... (no regularization)"
mix = GaussianMixture(K)
fit_rolling!(mix, data, WIN)

@info "Fitting... (with regularization)"
mix = GaussianMixture(K)
reg = EM.InvGamma(.1, .1)
fit_rolling!(mix, data, WIN; reg=reg)

nothing
end

Oh no, I forgot to delete # Actually fit shit from my code, sorry about that!

2 Likes

I have no personal feelings for that data.

3 Likes

This is sort of speculative, but at one point I changed the @tturbo by @threads (had to change the loops), and ended up killing the execution because instead of 14 seconds it was taking about 3 minutes. As much as I am a fan of LoopVectorization, that seems exaggerated. Are you sure the results are correct? Maybe inspecting the code running in serial, and without the vectorization first is a good idea.