Gradient-based methods are not moving some parameters

cosmia · February 13, 2020, 12:39am

Hi all, I have a maximum likelihood estimator in which I need to integrate using a mixture of two normal distributions with zero mean. Without getting into much details, this entails estimating 3 parameters of the distribution, the standard deviations and the mixture probability.

The weird thing is that whenever I use a gradient-based method on Optim, the parameter corresponding to the mixture probability is always equal to the starting value at the end of the estimation (whether it converged or not), while when I use Nelder-Mead or SA, all parameters ‘move’.

Additionally, both methods lead to the correct parameters of the rest of the model, although when using gradient-based methods the standard deviation estimates are nonsense but are correct on Nelder-Mead.

Maybe this is not specifically about Julia and Optim and there is some problem when estimating mixture probabilities with gradient methods? Or is there something in my code that is causing this problem? I put below a simple (meaningless) example that shows how I’m constructing the draws.

PS: This also happens if I use quadrature methods.

using Distributions, Optim, NLSolversBase

function testll(param, MC)
    baseprob = MC[:,2]
    mix = 1 / (1 + exp(-param[1]))
    draws = Array{eltype(param)}(undef, size(MC,1))
    for i = 1:size(draws,1)
        if baseprob[i] < mix
            draws[i] = MC[i,1] * param[2]
        else
            draws[i] = MC[i,1] * param[3]
        end
    end
    return log(mean(pdf.(Normal(0,10), draws)))
end
function wrap(param)
    return testll(param, MC)
end
MC = [rand(Normal(0,1), 1_000) rand(Uniform(0,1), 1_000)]

func = TwiceDifferentiable(wrap, ones(3); autodiff=:forward);
opt = optimize(func, ones(3) , Optim.Options(show_trace=true))
opt1 = optimize(wrap, ones(3), Optim.Options(show_trace=true))
Optim.minimizer(opt)  ## Optim.minimizer(opt)[1] is equal to starting value, no matter what it is, try ones(3) .+ 10
Optim.minimizer(opt1) ## all parameters change

aaowens · February 13, 2020, 4:13am

The issue is that your function isn’t differentiable in the mix parameter. If you move mix by epsilon, nothing will happen almost everywhere because there’s a finite number of draws. Nelder-Mead works fine because it doesn’t take derivatives.

I bet there’s a way to make it continuous.

Tamas_Papp · February 13, 2020, 7:25am

I agree with @aaowens about the non-differentiability being a problem here.

FWIW, I don’t recognize this code as a likelihood for a mixture distribution. Generally it looks something like

\theta f_1(x) + (1-\theta) f_2(x)

which is continuous in the mixture parameter \theta.

cosmia · February 13, 2020, 4:16pm

Makes a lot of sense, thanks! I’ll look into it!

To be more precise, I’m doing a monte carlo integration where the distribution is a mixture of normals. So I need draws from the mixture, which I believe is correctly done in the code above?

Topic		Replies	Views
Using gradient function with Optim New to Julia question	4	1716	September 18, 2017
Help with gradient-based optimizers Optimization (Mathematical) question	14	438	September 21, 2024
Can't seem to get Optimization.jl to converge to solution in Monte Carlo simulation but the objective function seems to work okay otherwise Optimization (Mathematical) question	9	398	October 29, 2022
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3150	August 22, 2017
Maximum Likelihood: Normal Linear Model Optimization (Mathematical)	31	4264	December 25, 2020

Gradient-based methods are not moving some parameters

Related topics