Error optimizing using autodifferentation

I am a new Julia user working with the Optim package and trying to estimate a model (minimizing the sum of squared residuals), but receive an error (I read the PSA on posting–let me know if I could do any better in regards to norms of posting). I am trying to use automatic differentiation as I think my objective function should actually be well behaved and a gradient based search method should therefore (I believe) have much better performance. From looking around the forums, it seems that others have had a similar problem, although I was not able to follow/understand the answers, and wonder if it might be related to variable types.

Here is the error message:

MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:718
  Float64(!Matched::Int8) at float.jl:60
  ...
convert(::Type{Float64}, ::ForwardDiff.Dual{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11}) at number.jl:7
setindex!(::Array{Float64,2}, ::ForwardDiff.Dual{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11}, ::Int64, ::Int64) at array.jl:784
valfsotath(::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11},1}) at untitled-14a5e6d4bde70204cc3bff2d8469b7fc:23
chunk_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::typeof(valfsotath), ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11},1}}) at gradient.jl:140
gradient! at gradient.jl:37 [inlined]
gradient! at gradient.jl:33 [inlined]
(::NLSolversBase.var"#14#18"{Float64,typeof(valfsotath),ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(valfsotath),Float64},Float64,11},1}}})(::Array{Float64,1}, ::Array{Float64,1}) at oncedifferentiable.jl:70
value_gradient!!(::NLSolversBase.TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
initial_state(::Optim.Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}, ::NLSolversBase.TwiceDifferentiable{Float64,Array{Float64,1},Array{Float64,2},Array{Float64,1}}, ::Array{Float64,1}) at newton.jl:45
optimize at optimize.jl:33 [inlined]
#optimize#96 at interface.jl:136 [inlined]
optimize(::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Optim.Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}, ::Optim.Options{Float64,Nothing}) at interface.jl:135
optimize(::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Optim.Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}}) at interface.jl:135
top-level scope at untitled-14a5e6d4bde70204cc3bff2d8469b7fc:307

My code is (my attempt at a MWE, after removing some loops):

Pkg.add("Optim");
import Optim;

##

function valfsotath(theta)

theta_dum = theta[size(theta)[1],1];
pre_theta_else = theta[1:(size(theta)[1]-1),1];

theta_else = zeros(size(grid_interact)[2],1)

#demand
theta_else[1,1] = pre_theta_else[1,1];
#price
theta_else[2:nummills+1,1] = theta_else[2:nummills+1,1] .+ pre_theta_else[2,1];
#past quant
theta_else[nummills+2:2*nummills+1,1] = theta_else[nummills+2:2*nummills+1,1] .+ pre_theta_else[3,1];
#impquantgaur
theta_else[2*nummills+2:3*nummills+1,1] = theta_else[2*nummills+2:3*nummills+1,1] .+ pre_theta_else[4,1];
#lrun mup
theta_else[3*nummills+2:4*nummills+1,1] = theta_else[3*nummills+2:4*nummills+1,1] .+ pre_theta_else[5,1];


#demand squared
theta_else[4*nummills+2,1] = theta_else[4*nummills+2,1] .+ pre_theta_else[6,1];
#demand X price
theta_else[4*nummills+3:5*nummills+2,1] = theta_else[4*nummills+3:5*nummills+2,1] .+ pre_theta_else[7,1];
#demand X past quant
theta_else[5*nummills+3:6*nummills+2,1] = theta_else[5*nummills+3:6*nummills+2,1] .+ pre_theta_else[8,1];
#demand X impquantgaur
theta_else[6*nummills+3:7*nummills+2,1] =  theta_else[6*nummills+3:7*nummills+2,1] .+ pre_theta_else[9,1];
#demand X lrun mup
theta_else[7*nummills+3:8*nummills+2,1] =  theta_else[7*nummills+3:8*nummills+2,1] .+ pre_theta_else[10,1];

next_start = 8*nummills+3;
num_add = (nummills ^ 2);

#price squared and price interacts
theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] =  theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] .+ pre_theta_else[11,1];
#price X past quant
theta_else[next_start+nummills+binomial(nummills,2):next_start+nummills+binomial(nummills,2)+num_add - 1,1] = theta_else[next_start+nummills+binomial(nummills,2):next_start+nummills+binomial(nummills,2)+num_add - 1,1] .+ pre_theta_else[12,1];
#price X impquantgaur
theta_else[next_start+nummills+binomial(nummills,2)+num_add:next_start+nummills+binomial(nummills,2)+2*num_add - 1,1] = theta_else[next_start+nummills+binomial(nummills,2)+num_add:next_start+nummills+binomial(nummills,2)+2*num_add - 1,1] .+ pre_theta_else[13,1];
#price X lrun mup
theta_else[next_start+nummills+binomial(nummills,2)+2*num_add:next_start+nummills+binomial(nummills,2)+3*num_add - 1,1] = theta_else[next_start+nummills+binomial(nummills,2)+2*num_add:next_start+nummills+binomial(nummills,2)+3*num_add - 1,1] .+ pre_theta_else[14,1];


next_start = next_start+nummills+binomial(nummills,2)+3*num_add;
#past quant squared and past quant interacts
theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] = theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] .+ pre_theta_else[15,1];
#past quant X impquantgaur
theta_else[next_start+nummills+binomial(nummills,2):next_start+nummills+binomial(nummills,2)+num_add - 1,1] = theta_else[next_start+nummills+binomial(nummills,2):next_start+nummills+binomial(nummills,2)+num_add - 1,1] .+ pre_theta_else[16,1];
#past quant X  lrun mup
theta_else[next_start+nummills+binomial(nummills,2)+num_add:next_start+nummills+binomial(nummills,2)+2*num_add - 1,1] = theta_else[next_start+nummills+binomial(nummills,2)+num_add:next_start+nummills+binomial(nummills,2)+2*num_add - 1,1] .+ pre_theta_else[17,1];


next_start = next_start+nummills+binomial(nummills,2)+2*num_add;
#impquantgaur squared and impquantgaur interacts
theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] = theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] .+ pre_theta_else[18,1];
#impquantgaur X lrun mup
theta_else[next_start+nummills+binomial(nummills,2):next_start+nummills+binomial(nummills,2)+num_add - 1,1] = theta_else[next_start+nummills+binomial(nummills,2):next_start+nummills+binomial(nummills,2)+num_add - 1,1] .+ pre_theta_else[19,1];


next_start = next_start+nummills+binomial(nummills,2)+num_add ;
#lrun mup squared
theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] = theta_else[next_start:next_start+nummills+binomial(nummills,2) - 1,1] .+ pre_theta_else[20,1];

update_grid_interact = grid_interact * 0;
valf = zeros(size(grid_interact)[1],1);
#resid = zeros(size(grid_interact)[1],1);

dprime = 1

val_by_winner = zeros(size(grid)[1],nummills)

for winner = 1:nummills
     quant_past = grid[:,2+nummills:1+2*nummills];
     quant_past[:,winner] = quant_past[:,winner] + grid[:,1];
     quant_diff = broadcast(abs,quant_past .- grid[:,2+2*nummills:1+3*nummills]);
     flow_utility = -1 .* grid[:,1] .* grid[:,1+winner] .-
                    eta .* sum(quant_diff, dims = 2) .-
                    chi .* sum(quant_diff .^ 2, dims = 2);

    #Update
    mean_price = sum(grid[:,2:1+nummills], dims = 2) ./
                        nummills ;
    price_update = gamma1 .* (grid[:,2:1+nummills] .- mean_price .+ 1) .+
                  (1-gamma1) .* grid[:,2+3*nummills:1+4*nummills];
    quant_past_update = delta1 .* grid[:,2+nummills:1+2*nummills];
    quant_past_update[:,winner] = quant_past_update[:,winner] .+
                                  (1-delta1) .* grid_interact[:,1];
    impqgaur_update = grid[:,2+2*nummills:1+3*nummills] ./
                ( (  alpha1 .* grid[:,2+3*nummills:1+4*nummills]) .^ alpha2);
    lrunmup_update = gamma2 .* (grid[:,2:1+nummills] .- mean_price) .+
                 (1-gamma2) .* grid[:,2+3*nummills:1+4*nummills];
    demand_update = demand[dprime,1] .* ones(size(grid)[1],1) ;
    update_grid = hcat(demand_update,
                        price_update,
                        quant_past_update,
                        impqgaur_update,
                        lrunmup_update);
    colcountersota = size(update_grid)[2];
    update_grid_interact[:,1:colcountersota] = update_grid;
    for i = 1: size(grid)[2]
        for j = i:size(grid)[2]
            colcountersota = size(grid)[2] + (j - i + 1) +
                            countersize(i,size(grid)[2])
            update_grid_interact[:,colcountersota] = update_grid[:,i] .* update_grid[:,j];
        end
    end

    futval = beta .*
                ( update_grid_interact * theta_else .+
                ones(size(grid)[1],1) .* theta_dum );
    val_by_winner[:,winner] = flow_utility .+  futval;

    end

    max_val_by_winner =findmax(val_by_winner, dims = 2)[1];
    less_max = val_by_winner .-
                max_val_by_winner .* ones(size(val_by_winner)[1],size(val_by_winner)[2])
    exp_sum = sum(broadcast(exp,less_max),dims = 2);
    log_sum = max_val_by_winner[:,1] .+
                     broadcast(log,exp_sum);

    #update transmat given dprime
    transmat_rep = repeat(trans_mat[:,dprime],
                     Int(round(size(grid)[1] / nummills ) ));
    valf[:,:] = valf[:,:] + log_sum .* transmat_rep;

currval = grid_interact  * theta_else .+
          ones(size(grid)[1],1) * theta_dum;
resid = (1/100000) .* (currval .- valf);
pre_ssr = resid' * resid;
ssr = convert(Float64,pre_ssr[1,1])

return ssr
end


##

function cartprod(x,y)

rows = size(x)[1] * size(y)[1]
columns = size(x)[2] + size(y)[2]
output = zeros(rows,columns);

for i in 1:size(x)[1]
    for j in 1:size(y)[1]
        iter = (i-1) * size(y)[1] + j;
        output[iter,:] = hcat(x[i,:]',y[j,:]')
    end
end

return output

end


function cartprodrev(x,y)

rows = size(x)[1] * size(y)[1]
columns = size(x)[2] + size(y)[2]
output = zeros(rows,columns)


for i in 1:size(y)[1]
    for j in 1:size(x)[1]
        iter = (i-1) * size(x)[1] + j;
        output[iter,:] = hcat(x[j,:]',y[i,:]')
    end
end

return output

end


function countersize(k,diml)
    #k is i and l is j
counter = 0

for a = 1:k
    counter = counter + (diml - (a - 2)) * (a > 1)
end

return counter

end

##
#initialize state vars
nummills = 3;
demand = [0 30 900]';
numdemand = size(demand)[1];

quant = [0 30 600]';
numquant = size(quant)[1];

trans_mat = [.2 .6 .2; .2 .5 .3 ; .1 .3 .6 ];

mup_min = .95;
mup_max = 1.15;
mup_inc = .1;

price_states = Int(round( (mup_max - mup_min) / mup_inc + 1));

beta = .98;

##
#initialize parameters
eta = 5
chi = 1.5

delta1 = .6
delta2 = .3;

gamma1 = .6
gamma2 = .3

alpha1 = .9
alpha2 = 4

##

quant_grid = cartprod(quant,quant)
for i in 3 : nummills
    new = cartprod(quant_grid,quant)
    global quant_grid = new
end


prices = zeros(convert(UInt8,price_states),1)

for i = 1:price_states
    j = i - 1
    prices[i,1] = j * mup_inc + mup_min
end


 price_grid = cartprod(prices,prices)

for i in 3 : nummills
      new = cartprod(price_grid,prices)
      global price_grid = new
end

##
lrunmup_grid = price_grid
pre_pastdemand_grid = quant_grid
impquantgaur_grid = quant_grid

pastdemand_grid = pre_pastdemand_grid

for i in 1 : (nummills-1)
    j = i + 1
    #print(i)
    subset_pastdemand_grid = pastdemand_grid[:,i] .>= pastdemand_grid[:,j]
    new = pastdemand_grid[subset_pastdemand_grid,:]
    global pastdemand_grid = new
end


new = 0;
subset_pastdemand_grid = 0;
pre_pastdemand_grid = 0;

pastdemand_impqgaur_grid = cartprod(pastdemand_grid,impquantgaur_grid)
grid = cartprodrev(demand,cartprodrev(price_grid,cartprodrev(pastdemand_impqgaur_grid,lrunmup_grid)))

##

total_cols = 2 * (1 + 4 * nummills) + binomial(1 + 4 * nummills,2);

grid_interact = zeros(size(grid)[1],total_cols);
colcounter = size(grid)[2]

grid_interact[:,1:colcounter] = grid;

for i = 1:size(grid)[2]
    for j = i:size(grid)[2]
        colcounter = size(grid)[2] + (j - i + 1) +
                        countersize(i,size(grid)[2])
        grid_interact[:,colcounter] = grid[:,i] .* grid[:,j]
    end
end

##

sota_cols = 21;
thetaval_0 = zeros(sota_cols) .+ .2;

##

od = Optim.OnceDifferentiable(valfsotath, thetaval_0; autodiff = :forward);
res_autodiff = Optim.minimizer(Optim.optimize(od, thetaval_0, Optim.Newton()))

The problem is that

allocate a matrix of Float64s, but it should be ForwardDiff.Dual. A quick workaround would be using something like

zeros(eltype(theta), size(grid_interact)[2],1)

and similar. See http://www.juliadiff.org/ForwardDiff.jl/latest/dev/how_it_works.html

1 Like

Thanks for directing me to the information on types-I will look into it more later!

As a quick note, I tried your quick fix and was not successful (also tried with the same adjustment for theta_dum).

You can use the function signatures

function valfsotath(theta::AbstractArray{TT}) where TT
function cartprod(x::AbstractArray{TT},y::AbstractArray{UU}) where {TT,UU}
function cartprodrev(x::AbstractArray{TT},y::AbstractArray{UU}) where {TT,UU}

and use these generic types everywhere:

zero(TT, n,m)
convert(TT,pre_ssr[1,1])
zero(promote_type(UU,TT), n,m)

This makes at least ForwardDiff.gradient(valfsotath, thetaval_0) work for me.
Note that using global variables (amongst a few other things) makes things quite slow.

This is potentially a bad answer because I didn’t look through your code in detail.
However I see that at the end of the definition for valfsotath(theta), you did the following:

sr = convert(Float64,pre_ssr[1,1])

Coding non-Real types into AD can often lead to trouble. As @Tamas_Papp pointed out, the package page explains how it works. I’d also suggest taking a look at the page on Limitations of ForwardDiff.

Below is just a simple illustration of how convert can produce the kind of error you got.
This works:

using ForwardDiff
function test_sq(x)
  y = x ^ 2
  return y
end
ForwardDiff.derivative(test_sq, 3)

Whereas the following leads to a Method Error:

function test_sq_convert(x)
  y = x ^ 2
  z = convert(Float64, y)
  return z
end
ForwardDiff.derivative(test_sq_convert, 3)

Thank you for providing your thoughts!

I am trying to go “back to basics” to understand this better. Surprisingly, even a very small change to the function (which I believe still keeps the function unary), throws an error:

using ForwardDiff
function test_sq_mat(x)
  y = x * x'
  return y
end
test = [3 3]

ForwardDiff.derivative(test_sq_mat, test)

Specifically, the error is:

MethodError: no method matching derivative(::typeof(test_sq_mat), ::Array{Int64,2})
Closest candidates are:
  derivative(::Any, ::AbstractArray, !Matched::Real) at /Users/X/.julia/packages/ForwardDiff/vt5F1/src/derivative.jl:27
  derivative(::Any, ::AbstractArray, !Matched::Real, !Matched::ForwardDiff.DerivativeConfig{T,D} where D) where T at /Users/X/.julia/packages/ForwardDiff/vt5F1/src/derivative.jl:27
  derivative(::Any, ::AbstractArray, !Matched::Real, !Matched::ForwardDiff.DerivativeConfig{T,D} where D, !Matched::Val{CHK}) where {T, CHK} at /Users/X/.julia/packages/ForwardDiff/vt5F1/src/derivative.jl:27
  ...
top-level scope at untitled-0fcac219d0c72ecce848492c42baa3cb:31

You call your function with a 1x2 Matrix and it returns a 1x1 Matrix. You probably want a 2 vector mapping to a scalar? Create vectors with commas. You can use the LinearAlgebra dot, or use return first(y)

import LinearAlgebra: ⋅ # type with \cdot<tab>
using ForwardDiff
function test_sq_mat(x)
  y = x⋅x
  return y
end
test = [1.0,2.0] # use commas to create vector
ForwardDiff.gradient(test_sq_mat, test)
1 Like

I wanted to share a few things that helped me resolve this issue!

The two key principles I found were that:
1/ I needed to make many parts of the code into arrays instead of vectors (so size of the object would be (X,) as opposed to (X,1))
2/ Sometimes updating vectors using vec[i] caused problems; instead, adding had no issues. I wonder if this reflects that direct updates are not differentiable in any sense that I understand.

Unfortunately, I don’t have a 100% clear understanding of why these changes matter and want to note that these adjustments did not need to be made everywhere in the code. I do not have a good understanding of which scenarios require these changes and which do not