I struggle to fit my data using Gaussian mixtures with Julia. I found the package GitHub - davidavdav/GaussianMixtures.jl: Large scale Gaussian Mixture Models which seems to work similarly to the scikit-stuff I used before but the actual fit gets stuck and every mixture component has the same parameters. I thought that’s an easy starting exercise but apparently I don’t grasp it.
Here is what I have:
using GaussianMixtures
using PGFPlotsX
using StatsBase
gmm = GMM(2, 1) # two components, one dimension
# dummy data
d₁ = randn(20000, 1) .- 5
d₂ = randn(100000, 1) .+ 23
d₊ = vcat(d₁, d₂) # 120000×1 Array{Float64,2}
This is how the distribution looks like:
h = fit(Histogram, d₊[:,1], nbins=40)
fig = @pgf Axis(Plot({"ybar interval"}, Coordinates(h.edges[1][2:end], h.weights)))
Running the fit should be easy:
result = em!(gmm, d₊)
however, this spits out the following log, indicating that the optimisation got stuck after the first iteration:
┌ Info: Running 10 iterations EM on diag cov GMM with 2 Gaussians in 1 dimensions
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/train.jl:242
┌ Info: iteration 1, average log likelihood -223.922195
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 2, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 3, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 4, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 5, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 6, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 7, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 8, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 9, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: iteration 10, average log likelihood -3.768908
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
┌ Info: EM with 120000 data points 10 iterations avll -3.768908
│ 24000.0 data points per parameter
└ @ GaussianMixtures /home/tgal/.julia/packages/GaussianMixtures/RGtTJ/src/gmms.jl:71
Looking at the \mu parameter, both Gauss components have the same value:
> gmm.μ
2×1 Array{Float64,2}:
18.332104281524813
18.332104281524813
The same holds if I try with more components…
So what is going wrong here?