Optim with autodiff return error when optimize paralleled objective function

zxjroger · February 2, 2021, 11:30pm

I have a very complicated objective function which involves some distributed call. I would like to optimize the function using Optim with Forward Difference. However, it reports an error. This is the MWE:

using Distributed
addprocs(Sys.CPU_THREADS-1)

@everywhere function fun_single(x)
    return x^2
end

# Not sure why need to make this function everywhere, but otherwise it reports a different error.
@everywhere function fun_parallel(data)
    result = ones(2)
    np = nprocs()
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > 2
                            break
                        end
                        result[idx] = remotecall_fetch(fun_single, p, data[idx])
                    end
                end
            end
        end
    end
    return sum(result)
end

using Optim
res = Optim.optimize(fun_parallel, ones(2), Newton(), Optim.Options(show_trace = true); autodiff = :forward)

The error I got is as follows:

TaskFailedException:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:716
  Float64(!Matched::Irrational{:γ}) at irrationals.jl:189
  ...
Stacktrace:
 [1] convert(::Type{Float64}, ::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2}) at ./number.jl:7
 [2] setindex!(::Array{Float64,1}, ::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2}, ::Int64) at ./array.jl:847
 [3] macro expansion at /Users/Desktop/Notes.jl:2695 [inlined]
 [4] (::var"#7#9"{Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1},Array{Float64,1},var"#nextidx#8",Int64})() at ./task.jl:356

...and 1 more exception(s).

sync_end(::Channel{Any}) at task.jl:314
macro expansion at task.jl:333 [inlined]
fun_parallel(::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1}) at Notes.jl:2686
vector_mode_dual_eval at apiutils.jl:37 [inlined]
vector_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::typeof(fun_parallel), ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1}}) at gradient.jl:113
gradient! at gradient.jl:37 [inlined]
gradient! at gradient.jl:35 [inlined]
(::NLSolversBase.var"#42#48"{typeof(fun_parallel),ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1}}})(::Array{Float64,1}, ::Array{Float64,1}) at twicedifferentiable.jl:130
value_gradient!!(::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
initial_state(::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}, ::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at newton.jl:45
optimize at optimize.jl:33 [inlined]
#optimize#87 at interface.jl:142 [inlined]
(::Optim.var"#optimize##kw")(::NamedTuple{(:autodiff,),Tuple{Symbol}}, ::typeof(optimize), ::Function, ::Array{Float64,1}, ::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at interface.jl:141
top-level scope at Notes.jl:2705

I wonder how I should revise my code. Thank you very much.

AndiMD · February 3, 2021, 2:40am

ForwardDiff requires your function to handle a vector of the ForwardDiff.Dual as argument type, and ones(2) creates a Float64 vector. Specify the type based on the argument type passed:

@everywhere function fun_parallel(data::Vector{T}) where T
    result = ones(T,2)
    np = nprocs()
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > 2
                            break
                        end
                        result[idx] = remotecall_fetch(fun_single, p, data[idx])
                    end
                end
            end
        end
    end
    return sum(result)
end

zxjroger · February 3, 2021, 4:39am

Thank you for your answer. I understand where my code is wrong but the real code is much more complicated and I don’t know how to fix it. Could you help me take a look? Thank you very much.

# This is my objective function, input is a vector of parameters and output is a real number.
# I believe there is something wrong with "fun_H_sim_parallel".

@everywhere function fun_objective(parameter)
    tau_R_est = parameter[1]
    tau_D_est = parameter[2]
    sd_eta_R_est = parameter[3]
    sd_eta_D_est = parameter[4]
    gamma_R_est = vcat([1.0], parameter[5:7])
    gamma_D_est = vcat([1.0], parameter[8:10])
    alpha_R1_est = parameter[11]
    alpha_R2_est = parameter[12]
    alpha_R3_est = parameter[13]
    alpha_D1_est = parameter[14]
    alpha_D2_est = parameter[15]
    alpha_D3_est = parameter[16]
    sd_epsilon_1_est = parameter[17]
    sd_epsilon_2_est = parameter[18]
    sd_epsilon_3_est = parameter[19]
    beta_1_est = parameter[20:21]
    beta_2_est = parameter[22:23]
    beta_3_est = parameter[24:25]
    sim_w_R_matrix, sim_w_D_matrix = fun_sim_w(tau_R_est, tau_D_est, sd_eta_R_est, sd_eta_D_est, gamma_R_est, gamma_D_est)
    H_R_sim = fun_H_sim_parallel(sim_w_R_matrix)
    H_D_sim = fun_H_sim_parallel(sim_w_D_matrix)
    sim_y_1, sim_y_2, sim_y_3 = fun_sim_y(alpha_R1_est, alpha_R2_est, alpha_R3_est, alpha_D1_est, alpha_D2_est, alpha_D3_est, sd_epsilon_1_est, sd_epsilon_2_est, sd_epsilon_3_est, beta_1_est, beta_2_est, beta_3_est, H_R_sim, H_D_sim)
    moments = vec(mean(fun_moments_sim(H_R_sim, H_D_sim, sim_y_1, sim_y_2, sim_y_3) - moments_data, dims=1))
    return transpose(moments) * weighting_matrix * moments
end

# This is the "fun_H_sim_parallel"
# The input is a matrix of matrices, the output is a matrix of vectors of Cartesian Indices.
# This function computes a linear programming problem in parallel.
# I guess either the type of "H_sim_temp" is not correct or the type of the input "sim_w_matrix" is not correct.

@everywhere function fun_H_sim_parallel(sim_w_matrix)
    H_sim_temp = Vector{Vector}(undef, T * num_simulation)
    np = nprocs()
    sim_w_matrix_temp = vec(sim_w_matrix)
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > T * num_simulation
                            break
                        end
                        H_sim_temp[idx] = remotecall_fetch(fun_sim_optimal_assignment, p, sim_w_matrix_temp[idx], N)
                    end
                end
            end
        end
    end
    return reshape(H_sim_temp, T, num_simulation)
end

# The linear programming problem is as follows.
# The input is a matrix and an integer, the output is a vector of Cartesian indices.

@everywhere function fun_sim_optimal_assignment(w_sim, N)
    model_sim = Model(optimizer_with_attributes(()->Gurobi.Optimizer(GRB_ENV)))
    @variable(model_sim, H_sim_temp[1:N, 1:N] >= 0)
    @constraint(model_sim, [i = 1:N], sum(H_sim_temp[i, j] for j = 1:N) == 1)
    @constraint(model_sim, [j = 1:N], sum(H_sim_temp[i, j] for i = 1:N) == 1)
    @objective(model_sim, Max, sum(w_sim[i, j] * H_sim_temp[i, j] for i = 1:N, j = 1:N))
    JuMP.optimize!(model_sim)
    if termination_status(model_sim) == MOI.OPTIMAL
        H_market_sim = value.(H_sim_temp)
        return findall(!iszero, sparse(H_market_sim))
    else
        error("The model was not solved correctly.")
    end
end

I guess that I need to somehow change the type of Cartesian index, but I couldn’t figure out how to do it. I really appreciate if you could give me some hints.

AndiMD · February 3, 2021, 2:08pm

I am afraid I do not see any obvious errors. It will be easier to narrow down the issue if you could provide a MWE that reproduces the error.

zxjroger · February 3, 2021, 4:01pm

No problem. I have revised my MWE as follows.

using Distributed, Random, Distributions
addprocs(Sys.CPU_THREADS-1)

@everywhere using JuMP, Gurobi, SparseArrays
@everywhere const GRB_ENV = Gurobi.Env()
@everywhere setparams!(GRB_ENV, Presolve=0, Method=1, OutputFlag=0)

# Linear Programming Problem
@everywhere function fun_lpp(w)
    N = size(w, 1)
    model = Model(optimizer_with_attributes(()->Gurobi.Optimizer(GRB_ENV)))
    @variable(model, H_temp[1:N, 1:N] >= 0)
    @constraint(model, [i = 1:N], sum(H_temp[i, j] for j = 1:N) == 1)
    @constraint(model, [j = 1:N], sum(H_temp[i, j] for i = 1:N) == 1)
    @objective(model, Max, sum(w[i, j] * H_temp[i, j] for i = 1:N, j = 1:N))
    JuMP.optimize!(model)
    H = value.(H_temp)
    return findall(!iszero, sparse(H))
end

# Parallel
@everywhere function fun_parallel(data)
    H_all = Vector{Vector}(undef, 10)
    np = nprocs()
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > 10
                            break
                        end
                        H_all[idx] = remotecall_fetch(fun_lpp, p, data[idx])
                    end
                end
            end
        end
    end
    return H_all
end

# Objective Function
@everywhere function fun_objective(p)
    Random.seed!(1024)
    data = [randn(3, 3) for i = 1:10]
    H_all = fun_parallel((p[1] + p[2]) .* data)
    temp = vcat([data[i][H_all[i]] for i = 1:10]...)
    return abs(mean(temp) - mean(mean(data)))
end

using Optim
initial_p = [0.5, 0.5]
res = Optim.optimize(fun_objective, initial_p, Newton(), Optim.Options(time_limit = 10.0, show_trace = true); autodiff = :forward)

I got the following error.

TaskFailedException:
On worker 2:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:716
  Float64(!Matched::Float32) at float.jl:255
  ...
convert at ./number.jl:7
_float at /Users/xunjie/.julia/packages/JuMP/qhoVb/src/operators.jl:12
* at /Users/xunjie/.julia/packages/JuMP/qhoVb/src/operators.jl:32 [inlined]
operate at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/interface.jl:110 [inlined]
operate at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/rewrite.jl:35 [inlined]
operate_fallback! at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/interface.jl:327 [inlined]
operate! at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/rewrite.jl:80 [inlined]
macro expansion at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/rewrite.jl:276 [inlined]
macro expansion at /Users/xunjie/.julia/packages/JuMP/qhoVb/src/macros.jl:830 [inlined]
fun_lpp at /Users/xunjie/Desktop/Notes.jl:2727
#106 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:294
run_work_thunk at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79
macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:294 [inlined]
#105 at ./task.jl:356
Stacktrace:
 [1] #remotecall_fetch#143 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:394 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:386
 [3] remotecall_fetch(::Function, ::Int64, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421
 [4] remotecall_fetch(::Function, ::Int64, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421
 [5] macro expansion at /Users/Desktop/Notes.jl:2748 [inlined]
 [6] (::var"#15#17"{Array{Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2},1},Array{Array{T,1} where T,1},var"#nextidx#16",Int64})() at ./task.jl:356

...and 2 more exception(s).

sync_end(::Channel{Any}) at task.jl:314
macro expansion at task.jl:333 [inlined]
fun_parallel(::Array{Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2},1}) at Notes.jl:2739
fun_objective(::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},1}) at Notes.jl:2761
vector_mode_dual_eval at apiutils.jl:37 [inlined]
vector_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::typeof(fun_objective), ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},1}}) at gradient.jl:113
gradient! at gradient.jl:37 [inlined]
gradient! at gradient.jl:35 [inlined]
(::NLSolversBase.var"#42#48"{typeof(fun_objective),ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},1}}})(::Array{Float64,1}, ::Array{Float64,1}) at twicedifferentiable.jl:130
value_gradient!!(::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
initial_state(::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}, ::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at newton.jl:45
optimize at optimize.jl:33 [inlined]
#optimize#87 at interface.jl:142 [inlined]
(::Optim.var"#optimize##kw")(::NamedTuple{(:autodiff,),Tuple{Symbol}}, ::typeof(Optim.optimize), ::Function, ::Array{Float64,1}, ::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at interface.jl:141
top-level scope at Notes.jl:2768

odow · February 3, 2021, 5:06pm

You can’t automatically differentiate a JuMP model so this approach won’t work. (fun_lpp is being passed a ForwardDiff.Dual.)

You can probably write this as a larger JuMP model.

zxjroger · February 3, 2021, 6:18pm

Thank you for your comments, Oscar. Do you mind elaborating on the suggestion of “a larger JuMP model”? I don’t really understand what you mean. Thanks.

pyrex41 · February 3, 2021, 6:57pm

Not a full solution, but in response to your question about a larger JuMP model, the JuMP interface is designed to work with a lot of different backends, including other non-Julia libraries. For automatic differentiation in Optim though (via ForwardDiff.jl), all of your code must be Julia only.

One way to see this is try to take the gradient of your obj. function. This is, if I understand it right, what Optim is doing under the hood with the autodiff = :forward tag.

function g!(G, coefs)
    grad = ForwardDiff.gradient(objective_func, coefs)
    for (i, g_) in enumerate(grad)
        G[i] = g_
    end
    Nothing
end

You use two different optimization interfaces in your code snipped above: one in fun_lpp that uses JuMP, and another at the bottom that uses Optim. It would seem to me that this is the core of the issue. I’ve never used JuMP beyond a toy model, so others may be able to speak to its extensibility, and to which of the two would be better for your problem.

odow · February 3, 2021, 7:18pm

I meant find some way of formulating a single optimization problem that includes H_temp and p simultaneously. But looking a bit deeper, that’s hard because you have a min-max formulation.

You could try a derivative free optimizer for fun_objective. There are various options listed in the Optimal documentation: Optim.jl

zxjroger · February 4, 2021, 12:22am

Thanks for the explanation. I guess then I have to give up. I have tried many derivative free solvers for my problem. Unfortunately, those solvers could not find the correct minimum. That’s why I am trying to test gradient based solvers using auto diff.

Topic		Replies	Views
The function to optimize in JuMP Optimization (Mathematical)	24	3813	August 29, 2017
Optim, Forward Differentiation and OnceDifferentiable with MLE Optimization (Mathematical) optim	18	2723	November 28, 2020
Forward differentiation and differential equations Modelling & Simulations diffeq , differentiation , optim , optimization	13	4435	November 7, 2023
Unknown error from ForwardDiff Optimization (Mathematical)	1	472	December 10, 2019
Error optimizing using autodifferentation General Usage package , differentiation , optimization	7	1081	April 23, 2020

Optim with autodiff return error when optimize paralleled objective function

Related topics