I want to hack Linesearches.jl to be able to stick a @threads macro on the linesearch part. I have an objective function for an optimization problem which has many parameters and is also very expensive to compute.

This will be a maximum likelihood estimation problem that will roughly have around 200 parameters to estimate. I usually use the Optimization.jl interface for Optim.jl and LineSearches.jl. Before giving up and going with Gradient Descent, I want to try to make BFGS or LBFGS work. I usually like using BFGS with the Backtracking line search algorithm, but I suspect 200 parameters might be too many for BFGS, hence LBFGS.

If what I understood about the Backtracking line search algorithm is correct, we will have multiple evaluations of the target function for different step sizes, from big steps to small steps. I donâ€™t remember exactly if these evaluations are performed by coordinate independent steps (first we evaluate (p1 + 1000, p2, â€¦) then (p1 + 100, p2, â€¦)) of if they are performed on the gradientâ€™s direction (first for p_vec - 1000 * grad and then for p_vec - 100 * grad). Anyway, my goal is to parallelize these target function evaluations, either by direction or stepsizes, whichever ends up being possible and makes the most sense.

I looked into the `perform_linesearch!`

function of Optim.jl https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/utilities/perform_linesearch.jl#L40

And also the Backtracking code of LineSearches.jl https://github.com/JuliaNLSolvers/LineSearches.jl/blob/master/src/backtracking.jl

However, I havenâ€™t yet been able to decipher which method to overload. I would appreciate any insights or tips.

I think â€śline searchâ€ť in L-BFGS and other newton-style solvers implies a 1-d search in a fixed direction, the product of the inverse (pseudo)-hessian and the gradient. Probably no parallel-ization benefit there.

2 Likes

Iâ€™d suggest giving `Optimization.LBFGS`

a try before you go down this rabbit hole, that method wraps LBFGS-B library and has excellent linesearch implementation. 200 parameters shouldnâ€™t be too bad if you are abe to use reverse-mode AD on your problem.

@tcovert, Even if itâ€™s in a fixed direction, it will evaluate the target function for different stepsizes. If I could evaluate two stepsizes simultaneously, it would already be a big performance boost. EDIT: After @kellertuer 's comment below I got your point.

@Vaibhavdixit02, Iâ€™m going to give it a look. The multiple barrier penalty iterations on Optimâ€™s `Fminbox`

algorithm would also make my problem more expensive to compute, so I imagine the bounded version of `LBFGS`

might help in that respect. I am not sure I can reverse autodiff my function, but I will have to try it.

1 Like

The usual backtracking performed is the Armijo linesearch, cf. Backtracking line search - Wikipedia, which would iteratively reduce the step, so there is no good way to parallelise it.

One alternative thing you could look for is to reduce the maximal number of checks (reduction steps) performed.

1 Like

All right, got it. Let me look into seeing if I can then reduce the maximal number of iterations as you suggested.

Oh they seem to have a maxiter but if that is hit they error (sorry, I am not using those often, I basically programmed my own).

On the optimization algorithms they have keyword arguments to control these things, but Iâ€™m going to look to see if that have it specifically for the linesearch part.

Can you link those? They might indeed be more like maximal iteration numbers for the solvers themselves not for the inner linesearch that is performed every step.

for the solvers themselves

Thatâ€™s what I meant, I didnâ€™t state it properly. Hereâ€™s the link from Optimâ€™s documentation:

https://julianlsolvers.github.io/Optim.jl/stable/user/config/

Yes I was not sure which of the plethora of packages you now refer to

Thatâ€™s the iterations of the solver. I also didi not expect that you can set line search options in there.