Help to use AD with Optimization.jl

Hi all,

I need your advice to use automatic differentiation with Optimization.jl.

The problem - I try to find the optimal variance of a process noise covariance matrix of a standard Kalman filter. Currently, I use gradient-free methods and it works well. However, I would like to use gradient-based methods for the sake of comparison.

MWE - Below is a representative example of the code I have implemented. (Sorry, it is quite long, but I don’t find a way to reduce it).

Code implementing the filter

struct StateSpace{T <: Real}
    A::Matrix{T}
    C::Matrix{T}
end

struct InitialConditions{T <: Real}
    x0::Vector{T}
    P0::Matrix{T}
end

struct FilterProblem{T <: Real}
    y::Vector{T}
    ss::StateSpace{T}
    ic::InitialConditions{T}
    R::T
end

struct FilterSolution{T <: Real}
    x::Matrix{T}
    P::Vector{Matrix{T}}
end

function solve(prob::FilterProblem)
    # Unpack the problem
    (; y, ss, ic, R) = prob
    (; A, C) = ss
    (; x0, P0) = ic

    # Preallocate the state and covariance matrices
    nt = length(y)
    ny, nx = size(C)
    x = similar(x0, nx, nt)
    P = [similar(P0) for _ in 1:nt]

    xpred = similar(x0)
    Ppred = similar(P0)
    Kk = similar(x0, nx, ny)
    T = similar(P0)

    # Initial conditions
    x[:, 1] .= x0
    P[1] .= P0

    optf = OptimizationFunction(optimQ!, Optimization.AutoForwardDiff())
    probQ = OptimizationProblem(optf, [-2., -2.], prob, lb = [-20., -20.], ub = [2., 2.])
    θ = Optimization.solve(probQ, Optimization.LBFGS()).u
    Q = Diagonal(θ)

    # Kalman filter algorithm
    for k in 2:nt
        # Prediction step
        xpred .= A*x[:, k - 1]
        Ppred .= A*P[k-1]*A' .+ Q

        # Update step
        ik = y[k] .- C*xpred
        Sk = C*Ppred*C' .+ R

        Kk = Ppred*(C'/Sk)
        T .= I - Kk*C
        x[:, k] .= xpred .+ Kk*ik
        P[k] .= T*Ppred*T' .+ Kk*R*Kk'
    end

    return FilterSolution(x, P)
end

function optimQ!(θ, p)
    # Unpack the problem
    (; y, ss, ic, R) = p
    (; A, C) = ss
    (; x0, P0) = ic

    # Preallocate the state and covariance matrices
    nt = length(y)
    ny, nx = size(C)
    x = similar(x0)
    P = similar(P0)

    xpred = similar(x0)
    Ppred = similar(P0)
    Kk = similar(x0, nx, ny)
    T = similar(P0)

    # Initial conditions
    x .= x0
    P .= P0
    Q = Diagonal(10 .^θ)

    # Kalman filter algorithm
    J = zero(eltype(x0))
    for k in 2:nt
        # Prediction step
        xpred .= A*x
        Ppred .= A*P*A' .+ Q

        # Update step
        ik = y[k] .- C*xpred
        Sk = C*Ppred*C' .+ R

        Kk = Ppred*(C'/Sk)
        T .= I - Kk*C
        x .= xpred .+ Kk*ik
        P .= T*Ppred*T' .+ Kk*R*Kk'

        J += logdet(Sk) + ik'*(Sk\ik)
    end

    return J
end

Script - Estimation of the state vector of a spring-mass-damper

using LinearAlgebra, Optimization
using ForwardDiff

includet("./KFilter.jl")

# Model parameters
m = 1.       # mass
ω0 = 10π     # natural angular frequency
ξ = 0.01     # damping ratio

# Initial conditions
x0 = 1.
v0 = -2.

# Time vector
t = 0:0.01:10.

# Noiseless free response
Ω0 = ω0*√(1 - ξ^2)
Ai = x0
Bi = (v0 + ξ*ω0*x0)/Ω0
y = (Ai*cos.(Ω0*t) .+ Bi*sin.(Ω0*t)).*exp.(-ξ*ω0*t)
v = Ω0*(Bi*cos.(Ω0*t) .- Ai*sin.(Ω0*t)).*exp.(-ξ*ω0*t) .- ξ*ω0*y

# Add noise to the measurements
σ = 0.1
yn = y .+ σ*randn(eltype(y), length(t))

# State-space model
Ac = [0. 1.; -ω0^2 -2ξ*ω0]
Cc = [1. 0.]

Ad = exp(Ac*step(t))
Cd = Cc

ss = StateSpace(Ad, Cd)
ic = InitialConditions([x0, v0], 1e-10*[1. 0.; 0. 1.])
R = σ^2

# Kalman filter
prob = FilterProblem(yn, ss, ic, R)
x = solve(prob).x

Some results

  • With ForwardDiff.jl (Optimization.AutoForwardDiff()), I get the standard error

    ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(optimQ!), Float64}, Float64, 2})
    The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.
    

    As you can see in the code, I tried to make the code generic (it is type stable), but it doesn’t work and I don’t know why :cry:

  • With Zygote.jl (Optimization.AutoZygote()), I get this error:

     Mutating arrays is not supported -- called copyto!(Matrix{Float64}, ...)
     This error occurs when you ask Zygote to differentiate operations that change
     the elements of arrays in place (e.g. setting values with x .= ...)
    

    I understand this error, because I use preallocations and broadcasting (but how to avoid this).

  • With Enzyme.jl (Optimization.AutoEnzyme()), I get this warning:

    TODO forward zero-set of memory copy used memset rather than runtime type
    

    and finally this error:

    ERROR: Enzyme compilation failed due to illegal type analysis.
    This usually indicates the use of a Union type, which is not fully supported with Enzyme.API.strictAliasing set to true [the default].
    Ideally, remove the union (which will also make your code faster), or try setting Enzyme.API.strictAliasing!(false) before any autodiff call.
    To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)
    

    The only Union I have is the signature of the for loop.

Finally, for the sake of the completeness, here are some extra info:

Julia Version 1.11.5
Commit 760b2e5b73 (2025-04-14 06:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code
[7da242da] Enzyme v0.13.41
[f6369f11] ForwardDiff v0.10.38
[7f7a1694] Optimization v4.2.0
[e88e6eb3] Zygote v0.7.7

If you have any advice for helping me to make it work, it would be great!

Thank you (and sorry for the long topic)

That is because you pre-allocated storage with Float64 element type, which during optimization will be populated with Dual numbers. The solution for that is to handle your storage with PreallocationTools.jl, until Optimization.jl can make better use of the Cache objects provided by DifferentiationInterface.jl.

To avoid this error, you would need to define a custom rule with ChainRulesCore.jl for your high-level function as a whole. That is what I did for HiddenMarkovModels.jl, where it turns out that the forward-backward algorithm gives you all the necessary information for derivatives. I assume similar results exist for the Kalman filter.
Alternately, Zygote.Buffer might be able to handle the simplest in-place operations.

Have you tried activating the recommended Enzyme.jl options? What is the stack trace?


Also, have you tried Mooncake.jl with AutoMooncake(; config=nothing)?

2 Likes

Thank you @gdalle for your detailed answer.

I keep your suggestion on ForwardDiff.jl and Zygote.jl, but it involves some extra coding that I can’t do at the moment.

Regarding Enzyme.jl, you’re right. I missed the recommended options. When using Enzyme.API.strictAliasing!(false), it runs without errors. Finally, with Mooncake.jl, it runs without errors and provide the same result as Enzyme.jl.

3 Likes

Out of curiosity, between Enzyme and Mooncake, which one runs faster?

1 Like

A not very scientific benchmark (@time) gives (Each backend have been tested from a fresh Julia session):

  • Mooncake

    • First run - 179.052240 seconds (163.39 M allocations: 7.996 GiB, 1.31% gc time, 95.26% compilation time: <1% of which was recompilation)
    • Second run - 2.151841 seconds (1.55 M allocations: 92.208 MiB, 1.76% gc time, 91.65% compilation time)
  • Enzyme

    • First run - 284.379204 seconds (144.68 M allocations: 8.173 GiB, 0.73% gc time, 99.89% compilation time: <1% of which was recompilation)
    • Second run - 0.167327 seconds (337.71 k allocations: 84.891 MiB, 23.82% gc time)
1 Like

@willtebbutt might be interested by the second Mooncake run taking so much time to compile stuff

2 Likes