Gradient-based methods are not moving some parameters

Hi all, I have a maximum likelihood estimator in which I need to integrate using a mixture of two normal distributions with zero mean. Without getting into much details, this entails estimating 3 parameters of the distribution, the standard deviations and the mixture probability.

The weird thing is that whenever I use a gradient-based method on Optim, the parameter corresponding to the mixture probability is always equal to the starting value at the end of the estimation (whether it converged or not), while when I use Nelder-Mead or SA, all parameters ‘move’.

Additionally, both methods lead to the correct parameters of the rest of the model, although when using gradient-based methods the standard deviation estimates are nonsense but are correct on Nelder-Mead.

Maybe this is not specifically about Julia and Optim and there is some problem when estimating mixture probabilities with gradient methods? Or is there something in my code that is causing this problem? I put below a simple (meaningless) example that shows how I’m constructing the draws.

PS: This also happens if I use quadrature methods.

using Distributions, Optim, NLSolversBase

function testll(param, MC)
    baseprob = MC[:,2]
    mix = 1 / (1 + exp(-param[1]))
    draws = Array{eltype(param)}(undef, size(MC,1))
    for i = 1:size(draws,1)
        if baseprob[i] < mix
            draws[i] = MC[i,1] * param[2]
            draws[i] = MC[i,1] * param[3]
    return log(mean(pdf.(Normal(0,10), draws)))
function wrap(param)
    return testll(param, MC)
MC = [rand(Normal(0,1), 1_000) rand(Uniform(0,1), 1_000)]

func = TwiceDifferentiable(wrap, ones(3); autodiff=:forward);
opt = optimize(func, ones(3) , Optim.Options(show_trace=true))
opt1 = optimize(wrap, ones(3), Optim.Options(show_trace=true))
Optim.minimizer(opt)  ## Optim.minimizer(opt)[1] is equal to starting value, no matter what it is, try ones(3) .+ 10
Optim.minimizer(opt1) ## all parameters change

The issue is that your function isn’t differentiable in the mix parameter. If you move mix by epsilon, nothing will happen almost everywhere because there’s a finite number of draws. Nelder-Mead works fine because it doesn’t take derivatives.

I bet there’s a way to make it continuous.


I agree with @aaowens about the non-differentiability being a problem here.

FWIW, I don’t recognize this code as a likelihood for a mixture distribution. Generally it looks something like

\theta f_1(x) + (1-\theta) f_2(x)

which is continuous in the mixture parameter \theta.

1 Like

Makes a lot of sense, thanks! I’ll look into it!

To be more precise, I’m doing a monte carlo integration where the distribution is a mixture of normals. So I need draws from the mixture, which I believe is correctly done in the code above?