Optim with autodiff return error when optimize paralleled objective function

I have a very complicated objective function which involves some distributed call. I would like to optimize the function using Optim with Forward Difference. However, it reports an error. This is the MWE:

using Distributed
addprocs(Sys.CPU_THREADS-1)

@everywhere function fun_single(x)
    return x^2
end

# Not sure why need to make this function everywhere, but otherwise it reports a different error.
@everywhere function fun_parallel(data)
    result = ones(2)
    np = nprocs()
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > 2
                            break
                        end
                        result[idx] = remotecall_fetch(fun_single, p, data[idx])
                    end
                end
            end
        end
    end
    return sum(result)
end

using Optim
res = Optim.optimize(fun_parallel, ones(2), Newton(), Optim.Options(show_trace = true); autodiff = :forward)

The error I got is as follows:

TaskFailedException:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:716
  Float64(!Matched::Irrational{:Îł}) at irrationals.jl:189
  ...
Stacktrace:
 [1] convert(::Type{Float64}, ::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2}) at ./number.jl:7
 [2] setindex!(::Array{Float64,1}, ::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2}, ::Int64) at ./array.jl:847
 [3] macro expansion at /Users/Desktop/Notes.jl:2695 [inlined]
 [4] (::var"#7#9"{Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1},Array{Float64,1},var"#nextidx#8",Int64})() at ./task.jl:356

...and 1 more exception(s).

sync_end(::Channel{Any}) at task.jl:314
macro expansion at task.jl:333 [inlined]
fun_parallel(::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1}) at Notes.jl:2686
vector_mode_dual_eval at apiutils.jl:37 [inlined]
vector_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::typeof(fun_parallel), ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1}}) at gradient.jl:113
gradient! at gradient.jl:37 [inlined]
gradient! at gradient.jl:35 [inlined]
(::NLSolversBase.var"#42#48"{typeof(fun_parallel),ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_parallel),Float64},Float64,2},1}}})(::Array{Float64,1}, ::Array{Float64,1}) at twicedifferentiable.jl:130
value_gradient!!(::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
initial_state(::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}, ::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at newton.jl:45
optimize at optimize.jl:33 [inlined]
#optimize#87 at interface.jl:142 [inlined]
(::Optim.var"#optimize##kw")(::NamedTuple{(:autodiff,),Tuple{Symbol}}, ::typeof(optimize), ::Function, ::Array{Float64,1}, ::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at interface.jl:141
top-level scope at Notes.jl:2705

I wonder how I should revise my code. Thank you very much.

ForwardDiff requires your function to handle a vector of the ForwardDiff.Dual as argument type, and ones(2) creates a Float64 vector. Specify the type based on the argument type passed:

@everywhere function fun_parallel(data::Vector{T}) where T
    result = ones(T,2)
    np = nprocs()
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > 2
                            break
                        end
                        result[idx] = remotecall_fetch(fun_single, p, data[idx])
                    end
                end
            end
        end
    end
    return sum(result)
end
1 Like

Thank you for your answer. I understand where my code is wrong but the real code is much more complicated and I don’t know how to fix it. Could you help me take a look? Thank you very much.

# This is my objective function, input is a vector of parameters and output is a real number.
# I believe there is something wrong with "fun_H_sim_parallel".

@everywhere function fun_objective(parameter)
    tau_R_est = parameter[1]
    tau_D_est = parameter[2]
    sd_eta_R_est = parameter[3]
    sd_eta_D_est = parameter[4]
    gamma_R_est = vcat([1.0], parameter[5:7])
    gamma_D_est = vcat([1.0], parameter[8:10])
    alpha_R1_est = parameter[11]
    alpha_R2_est = parameter[12]
    alpha_R3_est = parameter[13]
    alpha_D1_est = parameter[14]
    alpha_D2_est = parameter[15]
    alpha_D3_est = parameter[16]
    sd_epsilon_1_est = parameter[17]
    sd_epsilon_2_est = parameter[18]
    sd_epsilon_3_est = parameter[19]
    beta_1_est = parameter[20:21]
    beta_2_est = parameter[22:23]
    beta_3_est = parameter[24:25]
    sim_w_R_matrix, sim_w_D_matrix = fun_sim_w(tau_R_est, tau_D_est, sd_eta_R_est, sd_eta_D_est, gamma_R_est, gamma_D_est)
    H_R_sim = fun_H_sim_parallel(sim_w_R_matrix)
    H_D_sim = fun_H_sim_parallel(sim_w_D_matrix)
    sim_y_1, sim_y_2, sim_y_3 = fun_sim_y(alpha_R1_est, alpha_R2_est, alpha_R3_est, alpha_D1_est, alpha_D2_est, alpha_D3_est, sd_epsilon_1_est, sd_epsilon_2_est, sd_epsilon_3_est, beta_1_est, beta_2_est, beta_3_est, H_R_sim, H_D_sim)
    moments = vec(mean(fun_moments_sim(H_R_sim, H_D_sim, sim_y_1, sim_y_2, sim_y_3) - moments_data, dims=1))
    return transpose(moments) * weighting_matrix * moments
end

# This is the "fun_H_sim_parallel"
# The input is a matrix of matrices, the output is a matrix of vectors of Cartesian Indices.
# This function computes a linear programming problem in parallel.
# I guess either the type of "H_sim_temp" is not correct or the type of the input "sim_w_matrix" is not correct.

@everywhere function fun_H_sim_parallel(sim_w_matrix)
    H_sim_temp = Vector{Vector}(undef, T * num_simulation)
    np = nprocs()
    sim_w_matrix_temp = vec(sim_w_matrix)
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > T * num_simulation
                            break
                        end
                        H_sim_temp[idx] = remotecall_fetch(fun_sim_optimal_assignment, p, sim_w_matrix_temp[idx], N)
                    end
                end
            end
        end
    end
    return reshape(H_sim_temp, T, num_simulation)
end

# The linear programming problem is as follows.
# The input is a matrix and an integer, the output is a vector of Cartesian indices.

@everywhere function fun_sim_optimal_assignment(w_sim, N)
    model_sim = Model(optimizer_with_attributes(()->Gurobi.Optimizer(GRB_ENV)))
    @variable(model_sim, H_sim_temp[1:N, 1:N] >= 0)
    @constraint(model_sim, [i = 1:N], sum(H_sim_temp[i, j] for j = 1:N) == 1)
    @constraint(model_sim, [j = 1:N], sum(H_sim_temp[i, j] for i = 1:N) == 1)
    @objective(model_sim, Max, sum(w_sim[i, j] * H_sim_temp[i, j] for i = 1:N, j = 1:N))
    JuMP.optimize!(model_sim)
    if termination_status(model_sim) == MOI.OPTIMAL
        H_market_sim = value.(H_sim_temp)
        return findall(!iszero, sparse(H_market_sim))
    else
        error("The model was not solved correctly.")
    end
end

I guess that I need to somehow change the type of Cartesian index, but I couldn’t figure out how to do it. I really appreciate if you could give me some hints.

I am afraid I do not see any obvious errors. It will be easier to narrow down the issue if you could provide a MWE that reproduces the error.

No problem. I have revised my MWE as follows.

using Distributed, Random, Distributions
addprocs(Sys.CPU_THREADS-1)

@everywhere using JuMP, Gurobi, SparseArrays
@everywhere const GRB_ENV = Gurobi.Env()
@everywhere setparams!(GRB_ENV, Presolve=0, Method=1, OutputFlag=0)

# Linear Programming Problem
@everywhere function fun_lpp(w)
    N = size(w, 1)
    model = Model(optimizer_with_attributes(()->Gurobi.Optimizer(GRB_ENV)))
    @variable(model, H_temp[1:N, 1:N] >= 0)
    @constraint(model, [i = 1:N], sum(H_temp[i, j] for j = 1:N) == 1)
    @constraint(model, [j = 1:N], sum(H_temp[i, j] for i = 1:N) == 1)
    @objective(model, Max, sum(w[i, j] * H_temp[i, j] for i = 1:N, j = 1:N))
    JuMP.optimize!(model)
    H = value.(H_temp)
    return findall(!iszero, sparse(H))
end

# Parallel
@everywhere function fun_parallel(data)
    H_all = Vector{Vector}(undef, 10)
    np = nprocs()
    i = 1
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p = 1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > 10
                            break
                        end
                        H_all[idx] = remotecall_fetch(fun_lpp, p, data[idx])
                    end
                end
            end
        end
    end
    return H_all
end

# Objective Function
@everywhere function fun_objective(p)
    Random.seed!(1024)
    data = [randn(3, 3) for i = 1:10]
    H_all = fun_parallel((p[1] + p[2]) .* data)
    temp = vcat([data[i][H_all[i]] for i = 1:10]...)
    return abs(mean(temp) - mean(mean(data)))
end

using Optim
initial_p = [0.5, 0.5]
res = Optim.optimize(fun_objective, initial_p, Newton(), Optim.Options(time_limit = 10.0, show_trace = true); autodiff = :forward)

I got the following error.

TaskFailedException:
On worker 2:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:716
  Float64(!Matched::Float32) at float.jl:255
  ...
convert at ./number.jl:7
_float at /Users/xunjie/.julia/packages/JuMP/qhoVb/src/operators.jl:12
* at /Users/xunjie/.julia/packages/JuMP/qhoVb/src/operators.jl:32 [inlined]
operate at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/interface.jl:110 [inlined]
operate at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/rewrite.jl:35 [inlined]
operate_fallback! at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/interface.jl:327 [inlined]
operate! at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/rewrite.jl:80 [inlined]
macro expansion at /Users/xunjie/.julia/packages/MutableArithmetics/bPWR4/src/rewrite.jl:276 [inlined]
macro expansion at /Users/xunjie/.julia/packages/JuMP/qhoVb/src/macros.jl:830 [inlined]
fun_lpp at /Users/xunjie/Desktop/Notes.jl:2727
#106 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:294
run_work_thunk at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79
macro expansion at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:294 [inlined]
#105 at ./task.jl:356
Stacktrace:
 [1] #remotecall_fetch#143 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:394 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:386
 [3] remotecall_fetch(::Function, ::Int64, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421
 [4] remotecall_fetch(::Function, ::Int64, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421
 [5] macro expansion at /Users/Desktop/Notes.jl:2748 [inlined]
 [6] (::var"#15#17"{Array{Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2},1},Array{Array{T,1} where T,1},var"#nextidx#16",Int64})() at ./task.jl:356

...and 2 more exception(s).

sync_end(::Channel{Any}) at task.jl:314
macro expansion at task.jl:333 [inlined]
fun_parallel(::Array{Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},2},1}) at Notes.jl:2739
fun_objective(::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},1}) at Notes.jl:2761
vector_mode_dual_eval at apiutils.jl:37 [inlined]
vector_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::typeof(fun_objective), ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},1}}) at gradient.jl:113
gradient! at gradient.jl:37 [inlined]
gradient! at gradient.jl:35 [inlined]
(::NLSolversBase.var"#42#48"{typeof(fun_objective),ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(fun_objective),Float64},Float64,2},1}}})(::Array{Float64,1}, ::Array{Float64,1}) at twicedifferentiable.jl:130
value_gradient!!(::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
initial_state(::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}, ::TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at newton.jl:45
optimize at optimize.jl:33 [inlined]
#optimize#87 at interface.jl:142 [inlined]
(::Optim.var"#optimize##kw")(::NamedTuple{(:autodiff,),Tuple{Symbol}}, ::typeof(Optim.optimize), ::Function, ::Array{Float64,1}, ::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at interface.jl:141
top-level scope at Notes.jl:2768

You can’t automatically differentiate a JuMP model so this approach won’t work. (fun_lpp is being passed a ForwardDiff.Dual.)

You can probably write this as a larger JuMP model.

Thank you for your comments, Oscar. Do you mind elaborating on the suggestion of “a larger JuMP model”? I don’t really understand what you mean. Thanks.

Not a full solution, but in response to your question about a larger JuMP model, the JuMP interface is designed to work with a lot of different backends, including other non-Julia libraries. For automatic differentiation in Optim though (via ForwardDiff.jl), all of your code must be Julia only.

One way to see this is try to take the gradient of your obj. function. This is, if I understand it right, what Optim is doing under the hood with the autodiff = :forward tag.

function g!(G, coefs)
    grad = ForwardDiff.gradient(objective_func, coefs)
    for (i, g_) in enumerate(grad)
        G[i] = g_
    end
    Nothing
end

You use two different optimization interfaces in your code snipped above: one in fun_lpp that uses JuMP, and another at the bottom that uses Optim. It would seem to me that this is the core of the issue. I’ve never used JuMP beyond a toy model, so others may be able to speak to its extensibility, and to which of the two would be better for your problem.

I meant find some way of formulating a single optimization problem that includes H_temp and p simultaneously. But looking a bit deeper, that’s hard because you have a min-max formulation.

You could try a derivative free optimizer for fun_objective. There are various options listed in the Optimal documentation: Optim.jl

Thanks for the explanation. I guess then I have to give up. I have tried many derivative free solvers for my problem. Unfortunately, those solvers could not find the correct minimum. That’s why I am trying to test gradient based solvers using auto diff.