Suggestions needed: speed up optimization

DShiu · October 2, 2020, 10:08am

Agreed. After reading documentation and discussions I realized this issue, but haven’t figured out how to get it work

stevengj · October 2, 2020, 12:12pm

This will allocate an array on the heap each time it is called, which is bad for performance.

mikkoku · October 3, 2020, 6:13am

Could you just update negloglike inside the loop? That way you don’t need to worry about the prob vector at all.

Have you considered a non-adaptive numerical integration? In this case I see two possible problems with adaptive quadrature. 1) There is a sum and log transforms after it and thus effect of rtol in the end is not clear to me. 2) Since you are using somewhat low precision (rtol=0.001) it might cause the choice_prob_naive to have spurious jumps if the adaptation changes the quadrature points in a discontinuous manner. This could be bad for optimization.

DShiu · October 3, 2020, 5:00pm

What you do mean by “update negloglike inside the loop”? Compute it element by element in a loop? Doesn’t that require pre-allocating and gives me the trouble that Tamas_Papp pointed out?

I replaced rtol with with f_abstol. Would this help with the issue?

mikkoku · October 3, 2020, 7:01pm

I mean something like this

function loglike(θ, x, y, R, c, d, p0, B)
  negloglike = 0.0
  for i = 1:size(B,1)
    prob = choice_prob_naive(θ, r[i], x[i,:], y[i,:], R[i], c[i], d[i], p0[i])
    negloglike += - (B[i]*log(prob) + (1 - B[i])*log(1 - prob))
  end
  return negloglike
end

I think the same issues apply to any kind of adaptive quadrature, if it is necessary to use a largeish tolerance.

I recommend checking
https://github.com/JuliaMath/QuadGK.jl#gaussian-quadrature-and-arbitrary-weight-functions
for non-adaptive gaussian quadrature

DShiu · October 3, 2020, 8:21pm

Thanks, this is helpful! I didn’t realize that negloglike = 0.0 would not trigger the same issue for autodiff as the one triggered by pre-allocating using prob = zeros(size(B,1)).

My understanding of the tolerance problem from reading is that the adaptive integtion subdivides the space to do approximate integration until it is considered “good enough” and the number of subdivision depends on my tolerance level. If I switch to the gaussian quadrature, I impose a fixed number of subdivision and the number can be based on a tiny rtol. Then I apply this number to each integration in the loop. It this right?

bashonubuntu · October 3, 2020, 8:50pm

One more general point is that you can pass arrays of parametric types if you wanted to i.e. you can define a function like

function f(x::Array{T}) where {T}
        return Array{T}(repeat([0], outer = [10, 1]))
       end

julia> f(fill(0.0, 2))
10×1 Array{Float64,2}:
 0.0
 0.0
 0.0
 ⋮
 0.0
 0.0
 0.0

julia> f(fill(0, 2))
10×1 Array{Int64,2}:
 0
 0
 0
 ⋮
 0
 0
 0

julia> f(Array{Real}(undef, 0, 0))
10×1 Array{Real,2}:
 0
 0
 0
 ⋮
 0
 0
 0

So, in your code, something like

function loglike(θ::Array{T}, r, x y, R, c, d, p0, B) where {T}
    #prob = zeros(size(B,1))
    prob = Array{T}(repeat([0], outer = [size(B)[1], 1]))
    for i = 1:size(B,1)    
        prob[i] = choice_prob_naive(θ, r[i], x[i,:], y[i,:], R[i], c[i], d[i], p0[i])
    end
    negloglike = - (B'*log.(prob) + (1.0 .- B)'*log.(1.0 .- prob))  
    return negloglike
end

should be fine (but also check how you have defined choice_prob_naive().

mikkoku · October 4, 2020, 5:39am

Computing Dual + Float64 results in Dual, which means that type of negloglike changes. In general changing types is bad, but it might not be a problem here, since the computation happens in choice_prob_naive. It would be better to use zero(T) where T is the eltype of theta.

Yes, more specifically the adaptation chooses only some intervals, where the estimated approximation error is largest and subdivides them. I think you can also pre-compute the integration points and weights and use those for each integration.

In the end you probably should experiment with adaptive and non-adaptive quadrature and autodiff and derivative free optimization, to see which combination works best.

DShiu · October 4, 2020, 6:44am

I am still confused about the type issue related to autodiff. With type annotations removed, the optimization runs fine but when I try to get the Hessian from the results, Julia complains about the type issue…

td = TwiceDifferentiable(θ -> loglike(θ, r, x, y, R, c, d, p0, B), theta; autodiff = :forward);
@time theta_naive = optimize(td, theta, BFGS(), Optim.Options(f_abstol=10^-3, iterations=100_000, show_trace=true, show_every=1))

julia> H  = Optim.hessian!(td, theta_naive)
ERROR: MethodError: no method matching iterate(::Optim.MultivariateOptimizationResults{BFGS{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,Nothing,Flat},Float64,Array{Float64,1},Float64,Float64,Array{OptimizationState{Float64,BFGS{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,Nothing,Flat}},1},Bool})

mikkoku · October 4, 2020, 8:01am

The error is implying that hessian! doesn’t expect an OptimizationResult. You are missing Optim.minimizer.

DShiu · October 4, 2020, 4:08pm

Hmmm I see. How do you get this information from reading the error message? I often find these messages hard to understand.
Should we expect Optim.hessian! to run for a long time, say, 10% of the optimization?

tomerarnon · October 4, 2020, 7:06pm

ERROR: MethodError: no method matching iterate(::Optim.MultivariateOptimizationResults{...})

Says that there is “no method matching” i.e. “no known way of doing:” iterate(blah) where blah is the type of thing it got as an input. But you didn’t call iterate, so that may seem strange. The stacktrace will show that iterate is being called from something (that is being called from something that is being called from… etc.) that is eventually called from hessian!. Generally with stacktraces, you look for the last line in the stack that you “own” to figure out what you did wrong.

Topic		Replies	Views
Improving performance of a number of functions for large scale gradient based optimization Performance performance , optim , memory-allocation , loops	11	831	November 5, 2021
Optim search/convergence speed Performance optim , speed-optimization	0	184	April 4, 2023
Problem passing gradients into Optim General Usage optim , optimization	4	607	November 4, 2021
How to speed up the calculation of Logit likelihood? Performance question	8	471	September 23, 2023
Speeding up repeated optimizations within a large loop Performance economics , optim , optimization	7	855	June 25, 2020

Suggestions needed: speed up optimization

Related topics