I need to maximize a likelihood function which has 18 parameters using a large sample with 1 million observations. The objective function is defined as

```
function loglike(θ, x, y, R, c, d, p0, B)
prob = [choice_prob_naive(θ, r[i], x[i,:], y[i,:], R[i], c[i], d[i], p0[i])[1] for i = 1:size(B,1)]
negloglike = - (B'*log.(prob) + (1 .- B)'*log.(1 .- prob))
return negloglike
end
```

where each element of `prob`

is a choice probability for an observation given parameter values in \theta and is computed based on numerical integration. The optimization tool I use is

```
optimize(θ -> loglike(θ, x, y, R, c, d, p0, B), theta_initial, BFGS(), Optim.Options(g_tol = 1e-3, iterations=100_000, show_trace=true, show_every=5))
```

When I use 1% of my sample to run this program, it took more than 1 hour and I got the following:

```
Iter Function value Gradient norm
0 3.056383e+03 3.681269e+05
* time: 0.0
5 2.445321e+03 3.724024e+05
* time: 1194.0
10 2.232518e+03 2.865856e+05
* time: 1794.3980000019073
15 2.060120e+03 1.151448e+05
* time: 2050.8810000419617
20 1.968748e+03 3.514386e+04
* time: 2198.8180000782013
25 1.962631e+03 5.452261e+03
* time: 2389.8360002040863
30 1.961778e+03 2.888854e+03
* time: 2610.739000082016
35 1.961533e+03 3.544448e+03
* time: 2861.231000185013
40 1.960887e+03 2.669545e+01
* time: 3069.6550002098083
45 1.960862e+03 2.172777e+02
* time: 3338.9970002174377
50 1.960149e+03 1.045716e+03
* time: 3533.6230001449585
55 1.959932e+03 7.975524e+02
* time: 3687.270000219345
60 1.959879e+03 1.252292e+02
* time: 3870.1520001888275
65 1.959872e+03 1.338831e+01
* time: 4081.643000125885
70 1.959872e+03 2.457941e-02
* time: 4305.37700009346
4323.804038 seconds (153.88 G allocations: 2.822 TiB, 9.57% gc time)
* Status: failure (objective increased between iterations) (line search failed)
* Candidate solution
Minimizer: [-1.15e-03, -1.24e-01, -1.06e+00, ...]
Minimum: 1.959872e+03
* Found with
Algorithm: BFGS
Initial Point: [1.00e-03, 2.00e-02, 1.00e-01, ...]
* Convergence measures
|x - x'| = 9.18e-04 ≰ 0.0e+00
|x - x'|/|x'| = 1.11e-04 ≰ 0.0e+00
|f(x) - f(x')| = 8.07e-10 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 4.12e-13 ≰ 0.0e+00
|g(x)| = 2.46e-02 ≰ 1.0e-05
* Work counters
Seconds run: 4305 (vs limit Inf)
Iterations: 70
f(x) calls: 266
∇f(x) calls: 266
```

This result looks intimidating and I don’t even want to try the full sample with the same code. Could you share your suggestions for how to speed up the optimization to make it feasible? (e.g., the choice of optimization algorithm, parallel computing, use of GPU, buy a better CPU, use the university’s supercomputer, etc.)