How to speed up a slow optimization routine (slower than Python currently)

I’ve been porting some code from Python to Julia and it runs much slower than in Python. I’ve copied the function which is slowest below. This function takes ~3-4 seconds which itself is not horrible but I need to call this function thousands of times so I would like it to be quicker. In Python the code is ~10-20x faster.

I used the @time functionally and noticed that the memory allocation was MUCH higher than I would have expected:
3.723173 seconds (29.16 M allocations: 1.746 GiB, 6.48% gc time)
The two arrays passed to this function times and q_actual are 50x1 vectors of floats. I’m not really certain what the culprit is. I suspect defining the loss function inside of the main function is causing some issues but I do not know how to define it in a way where it still has access to the local variable “prob”.

On the other hand doing 500 iterations of an optimization solver might just take 4 seconds. Or using global variables like INTEGRATOR might be slow. I do not know. Any help would be appreciated. Thanks!

function get_params(x0, v0, mass, param_guess, times, q_actual)

#Define ODE for Equations of Motion
prob = ODEProblem(newtons_eqns!, [x0;v0], (0.0,times[end]), param_guess)

function loss(params,_)
    sol = solve(remake(prob, p=params), INTEGRATOR, saveat = times)
    if size(sol)[2] == size(q_actual)[1] #aparently this allows loss to still be AD compatible
        loss = sum([abs2(sol.u[i][1] - q_actual[i]) for i in 1:length(q_actual)]) #only take positions
        return loss
        return Inf

adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction(loss, adtype)
optprob = Optimization.OptimizationProblem(optf, param_guess)

result_ode = Optimization.solve(optprob, OPTIMIZER, maxiters = 500, allow_f_increases = true)

#Add local minimizer to polish off minimum
optprob2 = Optimization.OptimizationProblem(optf, result_ode.u)
result_ode2 = Optimization.solve(optprob2, BFGS(), maxiters = 30, allow_f_increases=true)

return result_ode2.u


1 Like

Step 1 is that you can save a lot of allocations by replacing sum([abs2(sol.u[i][1] - q_actual[i]) for i in 1:length(q_actual)]) with sum(abs2(sol.u[i][1] - q_actual[i]) for i in 1:length(q_actual))

That did not really do much, after the compilation run the times are all roughly:
4.911684 seconds (31.57 M allocations: 1.977 GiB, 6.82% gc time)

If that line is the culprit, is there some way to write that line without looping? I tried for a really long time and could not get it to work. Like in Python I could do something like sol.u[:,1] - q_actual but I couldn’t figure that out in Julia.

Why this choice? The problem that you describe suggests this is not a good choice. What was the reason for changing to this from Optimization.AutoForwardDiff()? Reverse mode would just be extra overhead for small problems like this and is likely the cause of the extra allocations.


Honestly, cause that’s what the documentation I read had. Not a great reason but I’m still figuring this whole Julia thing out. Did not realize that Zygote was using reverse diff. Replacing with forward diff did fix the issue:
0.017020 seconds (218.98 k allocations: 17.959 MiB)

Which page?

SciML Docs v7.2.2

cool I’ll update those to more appropriate choices, or actually no choice at all is better.

Awesome, it’s already fixed in the latest docs, but we just need to fix deployment.