Excessively long TTFX with Optimization and OptimizationOptimJL

Suppose we would like to solve the following problem :

min ||Ax - b||_2^2, s.t. x.>0

using OptimizationOptimJl.

One way to write it would be as follows:

using Optimization, OptimizationOptimJL

function obj_f(x,p)
    A = p[1]
    b = p[2]
    return sum((A* x - b).^2)
end

function solve_problem(A,b)

    optf = Optimization.OptimizationFunction(obj_f,Optimization.AutoForwardDiff())
    prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A,b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
    x = OptimizationOptimJL.solve(prob, OptimizationOptimJL.LBFGS(), maxiters=5000, maxtime=100)

    return x
end

The code above works, but it has a big drawback;
The first time you run it, it takes forever.

E.g., in my machine, running the command:
@time solve_problem(rand(10,100),rand(10))

for the first run:
95.392393 seconds (7.30 M allocations: 638.607 MiB, 0.29% gc time, 99.61% compilation time)

and for the second run:
1.916750 seconds (334.38 k allocations: 819.706 MiB, 1.86% gc time).

Is there anything that can be improved to mitigate this delay?
Any further performance tips on the code above would be more than welcome.

I’m aware that JuMP or another package would probably be more suitable for the job, but this is to be implemented in a package where OptimizationOptimJl is already a dependency for other reasons, so I’d like to avoid adding further dependencies if possible.

I am not sure if this is anything else than the standard Time-to-first-X (TTFX) problem in Julia. On my computer (MacBook Air M2) the first run takes 19 s, the second one takes 0.05 s.

By the way, and do not get me wrong, I certainly mean it to help, but you can perhaps increase your chances of getting relevant advice if you choose your title more descriptive of the content of the post. As a matter of fact, it is not much related to the linearity of the problem, is it? The more so that the optimization problem of nonnegative least squares perhaps would not even qualify as a linear one. I would put something like “Excessively long TTFX with Optimization and OptimizationOptimJL”.

1 Like

Very good point, thanks for that!

I don’t think we’ve setup this repo with PrecompilationTools.jl yet. If you’d like to help the PR would be rather simple, we just haven’t gotten around to it.

Oh that makes sense then!
I’d be happy to help as much as I can, if you could give me some steps to follow.

You just add a PrecompileTools.jl compile workload. For example, see:

You can just stick one right at the bottom of this file:

Basically the function you have up there is good. Have it loop over the most common optimizers. Maybe do a version that covers constraints too. Then it should actually precompile stuff and we can see what we get.

2 Likes

Thanks for the guidance!

I got some interesting behaviour I wouldn’t expect during some tests below:


PrecompileTools.@compile_workload begin
    A = rand(4, 4)
    b = rand(4)

    function obj_f(x, p)
        A = p[1]
        b = p[2]
        return sum((A * x - b) .^ 2)
    end

    function solve_nonnegative_least_squares(A, b, solver)

        optf = Optimization.OptimizationFunction(obj_f, Optimization.AutoForwardDiff())
        prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A, b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
        x = OptimizationOptimJL.solve(prob, solver, maxiters=5000, maxtime=100)

        return x
    end

    @time  x = solve_nonnegative_least_squares(rand(4,4), rand(4), OptimizationOptimJL.LBFGS())
    @time  x = solve_nonnegative_least_squares(rand(35,35), rand(35), OptimizationOptimJL.LBFGS())
    @time  x = solve_nonnegative_least_squares(rand(35,10), rand(35), OptimizationOptimJL.LBFGS())
end

If we run the contents of the compile workload line by line, it seems that precompiling for a 4x4 matrix does not help if you try to solve a 30x30 problem afterwards, it has to be precompiled again.
Even after that, precompiling for 4x4 and 35x35 doesn’t help for a 35x10 problem!
After these three however, different sizes (e.g. 100x60 or 60x10) don’t need precompiling.
Obviously these are just random numbers, and I didn’t test too many possible combinations.

Is that expected behaviour here?

Would that imply that for every solver we have to precompile for multiple matrix sizes?

As a typical dev response, this will be much better in future releases. A big part of this TTFX is that we used to create all oracles for all solvers, so this one is compiling hessian too, and not just gradient for you. The PR to avoid this in OptimizationBase is already merged, and I hope to get that out in a couple of weeks.

1 Like

Thanks for clarifying.
Just to be clear, does that imply that the solution to the issue is simply waiting for the next version of Optimization.jl?
Would the PR recommended by Chris still be useful?

1 Like

Yes, I expect the times to be better in the next release. But the PR would still be very useful, as it addresses separate concerns.

1 Like

Yes we need the PrecompilationTools step, but we also need to setup FunctionWrappers.jl a la what’s described in this blog post:

https://sciml.ai/news/2022/09/21/compile_time

Since we need both, we might as well have someone start doing to the PrecompilationTools step. The function wrapping part is a bit more intense will likely need me or @Vaibhavdixit02 to get involved.

1 Like

Got it, thanks for the info.
Before submitting the PrecompilationTools PR, would you please let me know, what’s with the matrix size thing?

1 Like

I’d need to see the ProfileView flame graph of what’s dropped in the compilation. My guess is that Optim may be taking a different code path for different sized vectors, but I’m not certain about it.

1 Like

Would this code do the job?

    using Optimization, OptimizationOptimJL
    
    function obj_f(x, p)
        A = p[1]
        b = p[2]
        return sum((A * x - b) .^ 2)
    end

    function solve_nonnegative_least_squares(A, b, solver)

        optf = Optimization.OptimizationFunction(obj_f, Optimization.AutoForwardDiff())
        prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A, b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
        x = OptimizationOptimJL.solve(prob, solver, maxiters=5000, maxtime=100)

        return x
    end

    @profview  x = solve_nonnegative_least_squares(rand(4,4), rand(4), OptimizationOptimJL.LBFGS())
    @profview  x = solve_nonnegative_least_squares(rand(35,35), rand(35), OptimizationOptimJL.LBFGS())
    @profview  x = solve_nonnegative_least_squares(rand(35,10), rand(35), OptimizationOptimJL.LBFGS())

I’m afraid I’ve got no idea how to share the flame graph directly here without loosing too many of the details.

1 Like

Can you show a picture and summarize what the large chunks are?

1 Like

Not sure how to visualize such a big chart with FlameGraphs.jl, so here’s an mp4 of scrolling through the vscode visualization (for precompiling for a 4x4 matrix).

And afterwards, here’s a screenshot for 35x35:

And for 35x10:

(these come from the code in the post above)

1 Like

Are you profiling time or profiling the compilation snoop?

It should be the

using ProfileView
ProfileView.view(flamegraph(tinf))
1 Like

Sorry if it’s a silly question, but where do you get that “tinf” object from?

I tried to follow what’s written in the blogpost like:

using Optimization, OptimizationOptimJL, SnoopCompile, ProfileView

function obj_f(x, p)
    A = p[1]
    b = p[2]
    return sum((A * x - b) .^ 2)
end

function solve_nonnegative_least_squares(A, b, solver)

    optf = Optimization.OptimizationFunction(obj_f, Optimization.AutoForwardDiff())
    prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A, b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
    x = OptimizationOptimJL.solve(prob, solver, maxiters=5000, maxtime=100)

    return x
end

tinf = @snoopi_deep solve_nonnegative_least_squares(rand(4, 4), rand(4), OptimizationOptimJL.LBFGS())
ProfileView.view(flamegraph(tinf))

but I get @snoopi_deep is not defined.

1 Like

Oh it was renamed to snoop inference

Makes sense then. I’ll have a look at it later today when I get to my laptop.