Using ProgressMeter with Optim.optimize?

Hello,

I sometimes provide parallel functions and parallel gradients to Optim and was wondering if it’s possible to show one’s progress while the computation is mid-way. I am not taking about show_trace = true because that only shows an update once the function and gradient evaluation is complete and results in an iteration step being taken. My functions and gradients can sometimes take an hour to compute so I’d like to know if the 1) Optim.optimize is calculating a function or gradient at the present moment and if so, 2) how far along is it in the calculation. My current implementation results in the following output but this sometimes breaks with multiple processes so I am wondering what’s the right way to improve my naive implementation. I am including a skeleton of the code and the screenshot of what I am getting below.

@everywhere function loglike_para(beta::Array{T}, data::R1, Applications::R2, sim_draws_tau::R4,
                       sim_draws_q::R5, sim_draws_eta::R6, num_draws_tau::S, num_draws_q::S, num_draws_eta::S, lambda::F, J::S,
                       day_new_sampled::Array{S}, Pr_D_sampled::Array{S}, K::S, numsample::S, num_offer_samples::S, num_app_samples::S, distribution::D) where {T<:Real} where {S<:Int} where {R1<:IndexedTable, R2<:DataFrame, R4<:IndexedTable, R5<:IndexedTable, R6<:IndexedTable} where {F<:Float64} where {D<:Bernoulli{Float64}}

  z = @showprogress 1 "Computing function..." @distributed (+) for i in 1:N
  sleep(0.0000001)
  #calculate the function in parallel
 end
 return z
end

@everywhere function diff_distributed(beta::Array{T}, data::V, Applications::R2, sim_draws_tau::R4, sim_draws_q::R5,
                           sim_draws_eta::R6, num_draws_tau::S, num_draws_q::S, num_draws_eta::S, lambda::F, J::S, day_new_sampled::Array{S},
                           Pr_D_sampled::Array{S}, K::S, numsample::S, num_offer_samples::S, num_app_samples::S, distribution::D, cfg::W) where {T<:Real} where {S<:Int} where {V<:Array{IndexedTable}} where {R2<:DataFrame, R4<:IndexedTable, R5<:IndexedTable, R6<:IndexedTable} where {F<:Float64} where {W<:ForwardDiff.GradientConfig} where {D<:Bernoulli{Float64}}

   z = @showprogress 1 "Computing gradient..." @distributed (+) for i in 1:N
      sleep(0.0000001)
      #calculate the gradient in parallel 
   end
   return z
end

#Implement L-BFGS() with Optim.jl 

res = Optim.optimize(x -> loglike_para(x, table(data_test), Applications, sim_draws_tau, sim_draws_q, sim_draws_eta, num_draws_tau, num_draws_q, num_draws_eta, lambda, J, day_new_sampled, Pr_D_sampled, K, numsample, num_offer_samples, num_app_samples, distribution), 
                     x -> diff_distributed(x, array_datatables(data_test), Applications, sim_draws_tau, sim_draws_q, sim_draws_eta, num_draws_tau, num_draws_q, num_draws_eta, lambda, J, day_new_sampled, Pr_D_sampled, K, numsample, num_offer_samples, num_app_samples, distribution, cfg), 
                     beta, LBFGS(), Optim.Options(iterations = 1, show_trace = true), inplace = false)

The screenshot below gives me what I’d like but it sometimes breaks with multiple processes and I’m wondering what’s the “safest” way to include progressbars when parallel functions and gradients are provided to Optim.optimize? (note: I have set iterations = 1 just for the sake of creating an example).
(also pinging @tim.holy :slight_smile: ) I guess I am also unsure about whether the “tips for parallel programming” section mentioned in https://github.com/timholy/ProgressMeter.jl is useful for my particular situation since the docs also give a suggestion similar to what I’ve implemented.

It would also be extremely helpful to know what the @sync and @async macros are doing in https://github.com/timholy/ProgressMeter.jl and whether these macros are always necessary or can be substituted by @showprogress in front of @distributed. However, the overall goal is to link Optim.optimize with @showprogress perhaps along similar lines to what I have tried to do above.

I was able to achieve this using a RemoteChannel as described in “Tips for Parallel Programming” in https://github.com/timholy/ProgressMeter.jl. This works very nicely and does not break (at least, in all the runs I’ve implemented) unlike @showprogress which would work fine sometimes but crash on other occasions (I’d say the success rate was ~50%). Note the use of x -> fetch(loglike(x, y)) inside Optim.optimize in the second solution compared to a plain x -> loglike(x, y). I find this to be slightly more useful that a plain show_trace = true as it would be helpful to know what step the LBFGS algorithm is currently at (computing function or gradient? If so, how far along is it? Is it computing the gradient multiple times between at an iteration step? And, if so does that mean we are stuck at a bad optimum => try different starting values). Using RemoteChannel() with Optim also helped me clarify things happening internally in Optim.optimize. For example, I became aware that LineSearches.HagerZhang() evaluates gradient as part of its linesearch but LineSearches.Backtracking() does not. Otherwise, the trace will just have a blinking cursor between iterations 0 and 1 and we might have to wait for a while to have some idea of progress (in this test case, 30 min).

Attaching a screenshot too :slight_smile:

3 Likes