Getting ForwardDiff.Dual to propagate through function

Hello! I would like to automatically differentiate a log-likelihood to obtain gradients and Hessians but have trouble propagating ForwardDiff.Dual types through my function.

I have the same questions as here and tried to adapt the solution. However, despite declaring my parameter vector as AbstractVector{T} I am still getting the error MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#173#174", Float64}, Float64, 2})

Can someone please help me rewrite the function so that it can be auto differentiated? The function call that produces the error is:

using Optim, ForwardDiff, NLSolversBase
s_v = [0.05; 0.05]
g = ForwardDiff.gradient(x -> FixedVarianceLogLik(x,hcat(ones(15),ones(15))), s_v)

And the definition of the contentious function is:

function FixedVarianceLogLik(param::AbstractVector{T},data,provideprocess = false,forecast_periods = 0) where T
    sample_size = size(data)[1];
    p=2;

    steps = sample_size + forecast_periods;

    prior_state = zeros(T,steps,p);
    posterior_state = zeros(T,steps,p);
    forecast_error = zeros(T,steps,1);
    prior_covariance = zeros(T,steps,p,p);
    forecast_error = zeros(T,steps,1);
    MSE_covariance = zeros(T,steps,1,1);
    neg_ll_one_obs = zeros(T,steps,1);
    kalman_gain = zeros(T,steps,p);
    posterior_state = zeros(T,steps,p);
    posterior_covariance = zeros(T,steps,p,p);
    #measurement variance
    #recall that this is time-dependent
    #process noise variance, restricted to the same
    Q1 = param[1]*param[1];
    Q2 = param[2]*param[2];
    neg_ll::T = 0.;
    #initialise
    posterior_covariance[1,:,:] = I(p);
    forecast_error[1] = 0.;
    kalman_gain[1,:] = T[0;0];
    neg_ll_one_obs[1] = 0.;
    prior_state[1,:] = T[ data[1];0  ] ;
    prior_covariance[1,:,:] = I(2);
    forecast_error[1] = 0.;
    MSE_covariance[1,1,1] = 1;
    posterior_state[1,:] = T[data[1];0];

    
    for t in 2:steps

        
        #Prediction step starts with the prior estimate
        Z = T[1 0];

        transition_matrix = T[   1 1;
                                0 1;];

       
        

        prior_state[t,:] = transition_matrix * posterior_state[t-1,:];
        #make sure it's a variance matrix proportional to the identity
        process_noise_covariance_timeindep = T[  Q1*I(1)     zeros(1,1);
                                                zeros(1,1)  Q2*I(1)]
        prior_covariance[t,:,:] = transition_matrix * posterior_covariance[t-1,:,:] * transition_matrix' + process_noise_covariance_timeindep;

        #try two things: Psi^star as slope AND FD as slope

        if t <= sample_size
            #Measurement step
            #Collect measurement and forecast (which here equals the prior state) and calculate log-likelihood
            #Update the filter with the measurements
            forecast_error[t] = data[t,1] - (Z * prior_state[t,:])[1];
            MSE_covariance[t,:,:] = Z * prior_covariance[t,:,:] * Z' + data[t,2] * I(1); #scalar
            #
            kalman_gain[t,:] = (prior_covariance[t,:,:] * Z') / MSE_covariance[t,:,:]; #need to multiply by the Z matrix which is unity here
            neg_ll_one_obs[t] = 0.5 *  log(MSE_covariance[t,1,1]) .+ 0.5 * (forecast_error[t]^2 .* inv(MSE_covariance[t,1,1]) );
            neg_ll += neg_ll_one_obs[t];
            
            posterior_state[t,:] = prior_state[t,:] + kalman_gain[t,:] * forecast_error[t];
            posterior_covariance[t,:,:] = (I(p) - kalman_gain[t,:] * Z) * prior_covariance[t,:,:] * (I(p) - kalman_gain[t,:] * Z)' + kalman_gain[t,:] * data[t,2] * I(1) * kalman_gain[t,:]';
            posterior_covariance[t,:,:] = (posterior_covariance[t,:,:] + posterior_covariance[t,:,:]') / 2
            #then t rolls one forward so that t-1 posteriors are used in the next iteration for the priors
        else
            #Measurement step
            #Collect measurement and forecast (which here equals the prior state) and calculate log-likelihood
            #Update the filter with the measurements
            forecast_error[t] = 0;
            MSE_covariance[t,:,:] = Z * prior_covariance[t,:,:] * Z' + data[sample_size,2] * I(1); #scalar
            #
            kalman_gain[t,:] = T[0;0];
            neg_ll_one_obs[t] = 0;
            neg_ll += neg_ll_one_obs[t];
            #keep the filter rolling forward without updating
            posterior_state[t,:] = prior_state[t,:];
            posterior_covariance[t,:,:] = prior_covariance[t,:,:];
            posterior_covariance[t,:,:] = (posterior_covariance[t,:,:] + posterior_covariance[t,:,:]') / 2;
        end

    end

    if provideprocess
        #           1                2                3                       4                       5                6                       7                        8                9                       10    11                           12                        
        return hcat(prior_state[:,1],prior_state[:,2],prior_covariance[:,1,1],prior_covariance[:,2,2],forecast_error, MSE_covariance[:,1], neg_ll_one_obs, kalman_gain[:,1], posterior_state[:,1],posterior_state[:,2],posterior_covariance[:,1,1],posterior_covariance[:,2,2]);
    else
        return neg_ll;
    end
end

Does the error stack trace have a line number inside the function FixedVarianceLogLik?

Hi!
Yes, strangely enough, line 2 is mentioned in the trace below. Unfortunately, I don’t understand why getting the sample size / dimensions of the data array matter here?

{
	"name": "MethodError",
	"message": "MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2})\n\nClosest candidates are:\n  (::Type{T})(::Real, !Matched::RoundingMode) where T<:AbstractFloat\n   @ Base rounding.jl:207\n  (::Type{T})(::T) where T<:Number\n   @ Core boot.jl:792\n  (::Type{T})(!Matched::VectorizationBase.Double{T}) where T<:Union{Float16, Float32, Float64, VectorizationBase.Vec{<:Any, <:Union{Float16, Float32, Float64}}, VectorizationBase.VecUnroll{var\"#s45\", var\"#s44\", var\"#s43\", V} where {var\"#s45\", var\"#s44\", var\"#s43\"<:Union{Float16, Float32, Float64}, V<:Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, SIMDTypes.Bit, VectorizationBase.AbstractSIMD{var\"#s44\", var\"#s43\"}}}}\n   @ VectorizationBase ~/.julia/packages/VectorizationBase/xE5Tx/src/special/double.jl:111\n  ...\n",
	"stack": "MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2})\n\nClosest candidates are:\n  (::Type{T})(::Real, !Matched::RoundingMode) where T<:AbstractFloat\n   @ Base rounding.jl:207\n  (::Type{T})(::T) where T<:Number\n   @ Core boot.jl:792\n  (::Type{T})(!Matched::VectorizationBase.Double{T}) where T<:Union{Float16, Float32, Float64, VectorizationBase.Vec{<:Any, <:Union{Float16, Float32, Float64}}, VectorizationBase.VecUnroll{var\"#s45\", var\"#s44\", var\"#s43\", V} where {var\"#s45\", var\"#s44\", var\"#s43\"<:Union{Float16, Float32, Float64}, V<:Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, SIMDTypes.Bit, VectorizationBase.AbstractSIMD{var\"#s44\", var\"#s43\"}}}}\n   @ VectorizationBase ~/.julia/packages/VectorizationBase/xE5Tx/src/special/double.jl:111\n  ...\n\n\nStacktrace:\n  [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2})\n    @ Base ./number.jl:7\n  [2] setindex!\n    @ ./array.jl:971 [inlined]\n  [3] macro expansion\n    @ ./multidimensional.jl:932 [inlined]\n  [4] macro expansion\n    @ ./cartesian.jl:64 [inlined]\n  [5] macro expansion\n    @ ./multidimensional.jl:927 [inlined]\n  [6] _unsafe_setindex!(::IndexLinear, ::Array{Float64, 3}, ::Matrix{ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2}}, ::Int64, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}})\n    @ Base ./multidimensional.jl:939\n  [7] _setindex!\n    @ ./multidimensional.jl:916 [inlined]\n  [8] setindex!\n    @ ./abstractarray.jl:1397 [inlined]\n  [9] FixedVarianceLogLik(param::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2}}, data::Matrix{Float64}, provideprocess::Bool, forecast_periods::Int64)\n    @ Main ~/Documents/impact-bam/dementia_forecast.ipynb:53\n [10] FixedVarianceLogLik(param::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2}}, data::Matrix{Float64})\n    @ Main ~/Documents/impact-bam/dementia_forecast.ipynb:2\n [11] (::var\"#231#232\")(x::Vector{ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2}})\n    @ Main ~/Documents/impact-bam/dementia_forecast.ipynb:2\n [12] vector_mode_dual_eval!\n    @ ~/.julia/packages/ForwardDiff/PcZ48/src/apiutils.jl:24 [inlined]\n [13] vector_mode_gradient(f::var\"#231#232\", x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2}}})\n    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:89\n [14] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2}}}, ::Val{true})\n    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:19\n [15] gradient(f::Function, x::Vector{Float64}, cfg::ForwardDiff.GradientConfig{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{var\"#231#232\", Float64}, Float64, 2}}})\n    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:17\n [16] gradient(f::Function, x::Vector{Float64})\n    @ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/gradient.jl:17\n [17] top-level scope\n    @ ~/Documents/impact-bam/dementia_forecast.ipynb:2"
}

I can’t reproduce the problem (after adding a using LinearAlgebra). I.e. a gradient is computed:

julia> g = ForwardDiff.gradient(x -> FixedVarianceLogLik(x,hcat(ones(15),ones(15))), s_v)
2-element Vector{Float64}:
 0.6742743519438372
 5.142802177577451

Could it be that you have an old version of some packages? I’m using julia version 1.10.0-rc2 (not a production version), and these packages:

 [f6369f11] ForwardDiff v0.10.36
 [d41bc354] NLSolversBase v7.8.3
 [429524aa] Optim v1.7.8

I restarted Julia and it worked with identical output. Could it be that changing and recompiling a function several times wears out some sort of cache?

If you have redefined the function many times, it’s more likely that you have a method defined which is not what you think. Say you start out with

f(x::Float64) = x + 2

Then you decide to make it more general, and also fix a bug, and redefine it as

f(x::Real) = x + 1

Now, unless you use the package Revise, you now have one function, f, with two methods. One which is used for Float64, which adds 2, and one for everything else Real, which adds 1.

If you’ve not restarted julia in a long time, it could very well be that you had some such method-chaos. If you do not need such functionality, it’s better to avoid specifying types altogether.