I have a convex objective function that I want to minimize (unconstrained). It depends ~ 200 variables. I’m running Optim on it, but it takes forever (has been computing for a full day now).
I want to improve the performance of this, if it is possible. First I would like to know the time that Optim typically would require on problems of this scale (hundreds of variables), so that I have a reference point.
Next, I would like to know what factors could be the main bottlenecks when using Optim, so that I can look in my code and see if I can fix them.
Other than that, are there any alternative good libraries for convex minimization (unconstrained) with a large number of variables?
Most of the algorithms in Optim are pretty standard ones, and have been used widely. As @tbreloff said, knowing more about your problem would help: a minimal working example, which algorithm you are using, and smoothness of your problem: in particular, if it is convex and continuously differentiable, Quasi-Newton optimizers should be very, very fast even on larger problems.
I think this is where you’re confusing people. Although there’s a tradition in optimization of talking about large problems in terms of the number of variables, almost all of the slowness in real code I’ve written using Optim.jl comes from the objective function and its gradient.
In a logistic regression implementation, for example, solving for 200 coefficients is trivial if you have 2,000 observations, but it’s going to be much slower if you 2,000,000,000 observations.
In terms of numbers of variables, I’ve solved problems with >10,000 variables before. Depending on the problem, it’s reasonable to think that could be done in seconds or years depending on other properties of your problem.
Why would use Optim.jl’s methods for general (nonlinear optimization) when you have a convex function? You should specialize the method to the problem. Some of the methods you can get from JuMP or Convex.jl are probably much better suited for the problem.
Yes, you can use JuMP for unconstrained problems. As John mentions, 200 variables is not unreasonably large, but performance will depend on how expensive computing the objective and its derivatives are.
Is this generally true? I’d be surprised if those solvers are substantially faster than L-BFGS for solving something as simple as logistic regression. Certainly doesn’t seem impossible, but I’d want to know why before claiming so.
It’s probably worth noting that the Optim.jl methods tend to work well for problems that are strongly convex. Where they would definitely fail are problems like LAD regression where the problem is convex, but not strongly convex.
I’m not sure. LAD regression is a classic example where rewriting your problem makes it clear how to solve the problem (using linear programming). I think both JuMP and Convex may do that for you, but I’m not qualified to say so.
If you can express your objective in a form that JuMP can handle, then I would expect function evaluations to be a small part of the total time for general nonlinear optimization. The linear algebra cost of calculating the newton step in general solvers like Ipopt or convex solvers like Mosek should dominate when things are working correctly.
It’s a bit overkill to use a general purpose or constrained convex solver for an unconstrained problem, but these solvers are (for the most part) well established and difficult to match in performance unless you’re able to take advantage of problem-specific structure.