I want to minimize a function of the type:
x = \textrm{argmin}_x \min_a || f(x) a - y ||^2
as it is quadratic in a
we know the optimum in a
is
a(x) = (f(x)^T f(x))^{-1} f(x)^T y
so we minimize
x = \text{argmin}_x || f(x) a(x) - y ||^2
to compute the objective function gradient we don’t need to propagate the gradient through the a(x) function as we know it is zero.
What is the optimal to put this knowledge that cannot be guessed by the AD framework?
For now, I use the @nograd macro of Zygote but then I’m stuck with this AD framework. Here is my MWE :
using LinearAlgebra, Zygote, ForwardDiff, Mooncake
const n = 1000
const k = range(0, 10, length=n) .* 2π
y = 0.1.*sin.(k .+ 2.5) .+ 0.1 * randn(n) .+ randn(1)
function lin_solve(m, y)
return (m' * m) \ m' * y
end
Zygote.@nograd function lin_solve_nograd(m, y)
return (m' * m) \ m' * y
end
function objective(x)
m = hcat(sin.(k .+ x),ones(n))
α = lin_solve(m, y)
return norm(m*α .- y)
end
function objective_nograd(x)
m = hcat(sin.(k .+ x),ones(n))
α = lin_solve_nograd(m, y)
return norm(m*α .- y)
end
backendZ = AutoZygote()
backendFD = AutoForwardDiff()
backendM = AutoMooncake(;config=nothing)
x= [1.0]
Here are the timings
julia> @btime DifferentiationInterface.gradient(objective, backendZ, x)
65.315 μs (145 allocations: 283.08 KiB)
1-element Vector{Float64}:
-0.06677430082653298
julia> @btime DifferentiationInterface.gradient(objective_nograd, backendZ, x)
39.641 μs (100 allocations: 139.55 KiB)
1-element Vector{Float64}:
-0.06677430082653381
julia> @btime DifferentiationInterface.gradient(objective, backendFD, x)
57.628 μs (33 allocations: 118.28 KiB)
1-element Vector{Float64}:
-0.06677430082653303
julia> @btime DifferentiationInterface.gradient(objective_nograd, backendFD, x)
57.569 μs (33 allocations: 118.28 KiB)
1-element Vector{Float64}:
-0.06677430082653303
julia> @btime DifferentiationInterface.gradient(objective, prep,backendM, x)
232.240 μs (273 allocations: 251.92 KiB)
1-element Vector{Float64}:
-0.06677430082653296
julia> @btime DifferentiationInterface.gradient(objective_nograd, prep,backendM, x)
235.357 μs (273 allocations: 251.92 KiB)
1-element Vector{Float64}:
-0.06677430082653296
The @nograd
macro of Zygote seems to be quite useful but unfortunately it is not seen by the other AD framework.