Excessively long TTFX with Optimization and OptimizationOptimJL

aris · August 21, 2024, 2:44pm

Suppose we would like to solve the following problem :

min ||Ax - b||_2^2, s.t. x.>0

using OptimizationOptimJl.

One way to write it would be as follows:

using Optimization, OptimizationOptimJL

function obj_f(x,p)
    A = p[1]
    b = p[2]
    return sum((A* x - b).^2)
end

function solve_problem(A,b)

    optf = Optimization.OptimizationFunction(obj_f,Optimization.AutoForwardDiff())
    prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A,b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
    x = OptimizationOptimJL.solve(prob, OptimizationOptimJL.LBFGS(), maxiters=5000, maxtime=100)

    return x
end

The code above works, but it has a big drawback;
The first time you run it, it takes forever.

E.g., in my machine, running the command:
@time solve_problem(rand(10,100),rand(10))

for the first run:
95.392393 seconds (7.30 M allocations: 638.607 MiB, 0.29% gc time, 99.61% compilation time)

and for the second run:
1.916750 seconds (334.38 k allocations: 819.706 MiB, 1.86% gc time).

Is there anything that can be improved to mitigate this delay?
Any further performance tips on the code above would be more than welcome.

I’m aware that JuMP or another package would probably be more suitable for the job, but this is to be implemented in a package where OptimizationOptimJl is already a dependency for other reasons, so I’d like to avoid adding further dependencies if possible.

zdenek_hurak · August 21, 2024, 3:25pm

I am not sure if this is anything else than the standard Time-to-first-X (TTFX) problem in Julia. On my computer (MacBook Air M2) the first run takes 19 s, the second one takes 0.05 s.

By the way, and do not get me wrong, I certainly mean it to help, but you can perhaps increase your chances of getting relevant advice if you choose your title more descriptive of the content of the post. As a matter of fact, it is not much related to the linearity of the problem, is it? The more so that the optimization problem of nonnegative least squares perhaps would not even qualify as a linear one. I would put something like “Excessively long TTFX with Optimization and OptimizationOptimJL”.

aris · August 21, 2024, 3:28pm

Very good point, thanks for that!

ChrisRackauckas · August 21, 2024, 6:19pm

I don’t think we’ve setup this repo with PrecompilationTools.jl yet. If you’d like to help the PR would be rather simple, we just haven’t gotten around to it.

aris · August 21, 2024, 6:27pm

Oh that makes sense then!
I’d be happy to help as much as I can, if you could give me some steps to follow.

ChrisRackauckas · August 21, 2024, 6:31pm

You just add a PrecompileTools.jl compile workload. For example, see:

github.com

SciML/LinearSolve.jl/blob/main/src/LinearSolve.jl#L209-L217


      
          PrecompileTools.@compile_workload begin
              A = rand(4, 4)
              b = rand(4)
              prob = LinearProblem(A, b)
              sol = solve(prob)
              sol = solve(prob, LUFactorization())
              sol = solve(prob, RFLUFactorization())
              sol = solve(prob, KrylovJL_GMRES())
          end

You can just stick one right at the bottom of this file:

github.com

SciML/Optimization.jl/blob/master/lib/OptimizationOptimJL/src/OptimizationOptimJL.jl#L455


      
              t1 = time()
              opt_ret = Symbol(Optim.converged(opt_res))
              stats = Optimization.OptimizationStats(; iterations = opt_res.iterations,
                  time = t1 - t0, fevals = opt_res.f_calls, gevals = opt_res.g_calls,
                  hevals = opt_res.h_calls)
              SciMLBase.build_solution(cache, cache.opt,
                  opt_res.minimizer, opt_res.minimum;
                  original = opt_res, retcode = opt_ret,
                  stats = stats)
          end
          
          end

Basically the function you have up there is good. Have it loop over the most common optimizers. Maybe do a version that covers constraints too. Then it should actually precompile stuff and we can see what we get.

aris · August 21, 2024, 7:07pm

Thanks for the guidance!

I got some interesting behaviour I wouldn’t expect during some tests below:


PrecompileTools.@compile_workload begin
    A = rand(4, 4)
    b = rand(4)

    function obj_f(x, p)
        A = p[1]
        b = p[2]
        return sum((A * x - b) .^ 2)
    end

    function solve_nonnegative_least_squares(A, b, solver)

        optf = Optimization.OptimizationFunction(obj_f, Optimization.AutoForwardDiff())
        prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A, b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
        x = OptimizationOptimJL.solve(prob, solver, maxiters=5000, maxtime=100)

        return x
    end

    @time  x = solve_nonnegative_least_squares(rand(4,4), rand(4), OptimizationOptimJL.LBFGS())
    @time  x = solve_nonnegative_least_squares(rand(35,35), rand(35), OptimizationOptimJL.LBFGS())
    @time  x = solve_nonnegative_least_squares(rand(35,10), rand(35), OptimizationOptimJL.LBFGS())
end

If we run the contents of the compile workload line by line, it seems that precompiling for a 4x4 matrix does not help if you try to solve a 30x30 problem afterwards, it has to be precompiled again.
Even after that, precompiling for 4x4 and 35x35 doesn’t help for a 35x10 problem!
After these three however, different sizes (e.g. 100x60 or 60x10) don’t need precompiling.
Obviously these are just random numbers, and I didn’t test too many possible combinations.

Is that expected behaviour here?

Would that imply that for every solver we have to precompile for multiple matrix sizes?

Vaibhavdixit02 · August 21, 2024, 9:06pm

As a typical dev response, this will be much better in future releases. A big part of this TTFX is that we used to create all oracles for all solvers, so this one is compiling hessian too, and not just gradient for you. The PR to avoid this in OptimizationBase is already merged, and I hope to get that out in a couple of weeks.

aris · August 21, 2024, 9:25pm

Thanks for clarifying.
Just to be clear, does that imply that the solution to the issue is simply waiting for the next version of Optimization.jl?
Would the PR recommended by Chris still be useful?

Vaibhavdixit02 · August 21, 2024, 9:36pm

Yes, I expect the times to be better in the next release. But the PR would still be very useful, as it addresses separate concerns.

ChrisRackauckas · August 21, 2024, 9:59pm

Yes we need the PrecompilationTools step, but we also need to setup FunctionWrappers.jl a la what’s described in this blog post:

https://sciml.ai/news/2022/09/21/compile_time

Since we need both, we might as well have someone start doing to the PrecompilationTools step. The function wrapping part is a bit more intense will likely need me or @Vaibhavdixit02 to get involved.

aris · August 22, 2024, 7:04am

Got it, thanks for the info.
Before submitting the PrecompilationTools PR, would you please let me know, what’s with the matrix size thing?

ChrisRackauckas · August 22, 2024, 11:35am

I’d need to see the ProfileView flame graph of what’s dropped in the compilation. My guess is that Optim may be taking a different code path for different sized vectors, but I’m not certain about it.

aris · August 22, 2024, 12:22pm

Would this code do the job?

    using Optimization, OptimizationOptimJL
    
    function obj_f(x, p)
        A = p[1]
        b = p[2]
        return sum((A * x - b) .^ 2)
    end

    function solve_nonnegative_least_squares(A, b, solver)

        optf = Optimization.OptimizationFunction(obj_f, Optimization.AutoForwardDiff())
        prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A, b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
        x = OptimizationOptimJL.solve(prob, solver, maxiters=5000, maxtime=100)

        return x
    end

    @profview  x = solve_nonnegative_least_squares(rand(4,4), rand(4), OptimizationOptimJL.LBFGS())
    @profview  x = solve_nonnegative_least_squares(rand(35,35), rand(35), OptimizationOptimJL.LBFGS())
    @profview  x = solve_nonnegative_least_squares(rand(35,10), rand(35), OptimizationOptimJL.LBFGS())

I’m afraid I’ve got no idea how to share the flame graph directly here without loosing too many of the details.

ChrisRackauckas · August 22, 2024, 12:49pm

Can you show a picture and summarize what the large chunks are?

aris · August 22, 2024, 1:24pm

Not sure how to visualize such a big chart with FlameGraphs.jl, so here’s an mp4 of scrolling through the vscode visualization (for precompiling for a 4x4 matrix).

And afterwards, here’s a screenshot for 35x35:

And for 35x10:

(these come from the code in the post above)

ChrisRackauckas · August 22, 2024, 2:57pm

Are you profiling time or profiling the compilation snoop?

It should be the

using ProfileView
ProfileView.view(flamegraph(tinf))

aris · August 22, 2024, 6:56pm

Sorry if it’s a silly question, but where do you get that “tinf” object from?

I tried to follow what’s written in the blogpost like:

using Optimization, OptimizationOptimJL, SnoopCompile, ProfileView

function obj_f(x, p)
    A = p[1]
    b = p[2]
    return sum((A * x - b) .^ 2)
end

function solve_nonnegative_least_squares(A, b, solver)

    optf = Optimization.OptimizationFunction(obj_f, Optimization.AutoForwardDiff())
    prob = Optimization.OptimizationProblem(optf, ones(size(A, 2)), (A, b), lb=zeros(size(A, 2)), ub=Inf * ones(size(A, 2)))
    x = OptimizationOptimJL.solve(prob, solver, maxiters=5000, maxtime=100)

    return x
end

tinf = @snoopi_deep solve_nonnegative_least_squares(rand(4, 4), rand(4), OptimizationOptimJL.LBFGS())
ProfileView.view(flamegraph(tinf))

but I get @snoopi_deep is not defined.

ChrisRackauckas · August 22, 2024, 9:42pm

Oh it was renamed to snoop inference

aris · August 23, 2024, 7:00am

Makes sense then. I’ll have a look at it later today when I get to my laptop.

Topic		Replies	Views
Understanding compilation with Optim.jl and OrdinaryDiffEq.jl Performance question , compilation	7	186	January 13, 2025
22 seconds to 3 and now more: Let's fix all of the DifferentialEquations.jl + universe compile times! Performance precompilation	6	5027	August 17, 2021
Julia vs R vs Python: simple MLE problem Optimization (Mathematical)	43	5207	April 5, 2019
Permanent recompilation of libraries and performance issues General Usage	6	449	November 26, 2022
Any way to speed up loading large precompiled packages? General Usage precompilation , ttfp , ttfx	35	3179	May 15, 2023

Excessively long TTFX with Optimization and OptimizationOptimJL

Related topics