Custom likelihood function optimization

Hi,
I’m having trouble with this likelihood maximization problem. My likelihood function “works” when I call it with some random parameters (i.e. it outputs a likelihood value) but when I pass it through the Optim.jl package (as done here: Optim.jl), I only get one iteration step and my hessian is all zeros. Am I missing something about the proper way to setup my likelihood function?

function berry(markets, firms, entry, mu, sigma, theta, draws)
    
    alpha = theta[1];
    beta = theta[2];
    delta = theta[3];

    (market_cnt, firm_cnt) = size(costs);

    # init empty maytrix to hold simulated probabilities of entry
    p_hat = zeros(market_cnt, firm_cnt);

    # number of simulation draws
    sim_cnt = size(draws, 1);

    # init vector of frequencies of observed entry
    freq = zeros(sim_cnt, 1);

    # simulate each market individually
    for m = 1:market_cnt

        #choice_matrix = zeros(F,T); # a population of choices for market m
        Xm = markets[m, 1];           # market characteristic
        Zf = transpose(firms[m, :]);         # firm-market characteristics

        #predictions = zeros(F, 1); # probability of entry for each firm in m

        # first_col and last_col gives us 1-100, 101-200, 201-300
        first_col = (m - 1) * firm_cnt + 1;   # first simulation column for this market
        last_col = m * firm_cnt;             # last simulation column for this market
        count = 0;                # number of observations
        for t = 1:sim_cnt  # loop over T draws of simulation estimator

            # simulated fixed cost phi for each firm k (ie firm k component of profits), with u = mu + sigma * v where v is normal(0, 1) and we scale it accordingly
            draw_vec = reshape(draws[t, first_col:last_col], 1, :)

            simphi = (alpha * Zf .+ mu) + sigma * draw_vec;
            #create transpose for certain ops later
            simphi_t = transpose(simphi)

            # estimated profits that result from entry by n firms
            p_hat = n -> beta * Xm - delta * log(n) .- simphi_t;

            # expected number of entrants when there are K possible entrants
            function entrants_cnt(F)
                foo = [sum(p_hat(n) .>= 0) .>= n for n = 1:F]
                return sum(foo)
            end

            entrants = entrants_cnt(firm_cnt)

            # most profitable enter first ie firm with least costs
            # first order and get indices
            # then apply masking vector to select the entrants that would be profitable given known number of entrants
            idx = sortperm(vec(simphi_t))

            # create the choice of entrants mask vector
            choices = repeat([0], inner = firm_cnt, outer = 1)
            choices[idx[1:entrants]] .= 1;

            # when firms are ordered by profits, predicted entry equals 1 for
            # firms with rank less than or equal to the expected # of entrants
            count += (choices == entry[m,:]);

        end  # end simulation loop
        freq[m, 1] = count/sim_cnt;
        
    end  # end market loop

    # calculate the log likelihood
    log_max_array = [log(max(10e-20, freq[i,1])) for i = 1:market_cnt]

    # since optim calculates the min, in order to maximize we need to add negative in front
    like = -1 * sum(log_max_array);
    return like
end

# like_test = berry(markets, firms, entry, mu, sigma, theta, draws)
mu = 1
sigma = 1
display(berry(markets, firms, entry, mu, sigma, theta, draws))
# testing the likelihood function produces: 351.360

initial = [rand(Uniform(-1, 4)), rand(Uniform(0, 3))]
twicediff_func = TwiceDifferentiable(x -> berry(markets, firms, entry, x[1], x[2], theta, draws), initial; autodiff=:forward)
opt = optimize(twicediff_func, initial)

Here are the printed results from opt

 * Status: success

 * Candidate solution
    Final objective value:     1.280294e+03

 * Found with
    Algorithm:     Newton's Method

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 0.00e+00 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    0
    f(x) calls:    1
    ∇f(x) calls:   1
    ∇²f(x) calls:  1

Your likelihood is like a step function, flat over regions of the parameter space. So, the gradient is zero almost everywhere, and a gradient based optimizer will not be able to find improvements. I suggest trying an optimizer that doesn’t require gradients. My personal favorite is simulated annealing, which is available in Optim.

2 Likes

Thanks, that makes a lot of sense!