Here are some results with the different optimizers with 10 parameters:
julia> # Using Optim + L-BFGS
julia> loss = (par) -> cost(par,p_cand)
julia> @time res10_lbfgs = optimize(loss,fill(0.3,10),fill(2.0,10),fill(1.0,10),Fminbox());
9.488374 seconds (120.21 M allocations: 4.825 GiB, 13.05% gc time)
julia> Optim.minimizer(res10_lbfgs)
10-element Array{Float64,1}:
1.0892319595180837
0.9224262335052348
0.5883927466637747
0.978128572405003
0.9315007544778905
1.1011157712133484
0.9069204188069746
1.0536679659377284
1.0970171891856608
1.103414236344359
Next:
julia> # Using Optim + GradientDescent
julia> @time res10_gd = optimize(loss,fill(0.3,10),fill(2.0,10),fill(1.0,10),Fminbox(GradientDescent()));
7.616987 seconds (103.95 M allocations: 4.346 GiB, 13.84% gc time)
julia> Optim.minimizer(res10_gd)
10-element Array{Float64,1}:
1.0297607064111505
0.9717089026600397
0.9701309719301778
0.9876652956389227
0.9747171511101008
0.9720729744237891
1.0240835871987013
1.0256016667497516
0.9734105609089325
0.971583538296498
Next:
# Using Optim + NelderMead
julia> @time res10_nm = optimize(loss,fill(0.3,10),fill(2.0,10),fill(1.0,10),Fminbox(NelderMead()));
1476.697756 seconds (26.13 G allocations: 1.021 TiB, 15.27% gc time)
julia> Optim.minimizer(res10_nm)
10-element Array{Float64,1}:
1.4433107238439118
1.4595059863802011
0.5872514501712995
0.9845185140096595
1.0806856155599496
1.2896505935597573
1.1001430661997713
1.2599199344489431
1.2835038180590241
1.5390862317949026
Next:
julia> # Using BlackBoxOptim
julia> @time res10_bb = bboptimize(loss; SearchRange = (0.3,2.0), NumDimensions = 10, TraceMode = :silent);
26.376395 seconds (499.54 M allocations: 20.009 GiB, 16.04% gc time)
julia> best_candidate(res10_bb)
10-element Array{Float64,1}:
1.7806061305131693
1.850768850577829
0.5869654613812466
1.1078252447426908
1.8256324607370495
0.7083605185266063
1.4639428229868445
1.4931612890482668
0.9388103412516573
1.5240243421627442
Finally, comparing the loss of model fit:
julia> Optim.minimum(res10_lbfgs), Optim.minimum(res10_gd), Optim.minimum(res10_nm), best_fitness(res10_bb)
(0.035504796897362405, 0.07465169329618884, 0.030234241700558218, 0.02926037639603645)
To me, the most remarkable observation here is the horrible performance/time consumption of the Nelder-Mead algorithm.
One strange thing is that the BlackBoxOptim algorithm for 10 variables in fact is considerable faster than the univariate case. WHY?
Finally, the resulting parameter values are quite different. The final cost function values for the gradient based solvers are quite similar, and markedly poorer than the results of the non-gradient based methods.
An interesting question here is the degree of “identifiability” of the parameters. This could probably be studied using some profile likelihood methods or related methods, or more theoretical methods – but the more theoretical methods are tricky…