ForwardDiff and Zygote return wrong jacobian for log(det(L))

I’m no expert in matrix derivatives so I may have made a mistake. The derivative of the log determinant of a Cholesky factor is

\frac{\partial \log|L|}{\partial L} = L^{-T}

using a toy example in Julia

using Zygote
S = [1.0 0.8; 0.8 1.0]
L = cholesky(S).L

reshape(jacobian(L -> log(det(L)), L)[1], 2, 2)
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.66667

inv(L')
julia> inv(L')
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  -1.33333
  ⋅    1.66667

is this a known issue or is there a work around?

1 Like

I think Zygote is correct. The determinant of a lower triangular matrix is simply the product of its diagonal elements.

julia> ForwardDiff.gradient(L -> log(prod(diag(L))), L)
2×2 LowerTriangular{Float64, Matrix{Float64}}:
 1.0   ⋅
 0.0  1.66667

confirms the correctness of Zygote.

2 Likes

See Derivative of log determinant of triangular matrix - Mathematics Stack Exchange
Screen Shot 2022-03-16 at 9.25.06 PM
and the site matrixcalculus.org has

which is why the result from Zygote is puzzling to me

That’s because the L factor from cholesky is a lower triangular matrix. If you want to perturb the upper triangular elements as well, you can convert the factor to a Matrix first.

julia> reshape(Zygote.jacobian(L -> log(det(L)), Matrix(L))[1], 2, 2)
2×2 Matrix{Float64}:
 1.0  -1.33333
 0.0   1.66667
6 Likes

Rad! It’s cool that the derivative is constrained on the space of lower triangular matrices and entirely obvious now! Thanks!

Does someone know why this does not work in ForwardDiff but only in Zygote?

julia> ForwardDiff.gradient(L -> log(det(L)), Matrix(L))

2×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.66667

julia> reshape(Zygote.jacobian(L -> log(det(L)), Matrix(L))[1], 2, 2)

2×2 Matrix{Float64}:
 1.0  -1.33333
 0.0   1.66667
1 Like

Because ForwardDiff has no special overloads for array valued functions, and LinearAlgebra’s det has the optimization of first checking if the matrix is triangular before computing the LU factorization. Hence, the L factor is going to only go through the triangular branch. As a workaround, you can do

julia> ForwardDiff.gradient(L -> log(det(lu(L))), Matrix(L))
2×2 Matrix{Float64}:
 1.0  -1.33333
 0.0   1.66667

or simply

julia> ForwardDiff.gradient(L -> logdet(L), Matrix(L))
2×2 Matrix{Float64}:
 1.0  -1.33333
 0.0   1.66667

because logdet doesn’t have the optimization (though, this is an implementation detail that should not be relied on).

1 Like

With #481 it works:

julia> ForwardDiff.gradient(L -> log(det(L)), Matrix(L))
2×2 Matrix{Float64}:
 1.0  -1.33333
 0.0   1.66667
2 Likes

Hm, seems like quite a significant PR without reviewers?

Edit: the highest ranked search result

(defn eq
  "For non-differentials, this is identical to [[clojure.core/=]].
  For [[Differential]] instances, equality acts on tangent components too.

  If you want to ignore the tangent components, use [[equiv]]."
  ([_] true)
  ([a b]
   (= (->terms a)
      (->terms b)))
  ([a b & more]
   (reduce eq (eq a b) more)))

seems to agree with you. And BTW: the imaginary part in the PR 's title references the tangents/gradients (I first thought the PR would be about complex numbers)?

The PR is a bit stuck because every time I try to fix the tests I get lost – there are many levels of nested testsets trying to ensure that nothing ever changes, but some must change for this.

Sorry I guess “imaginary” is misleading. The essay about what to do is this issue. When I searched I got the impression that everyone flipped a coin to decide which rule to adopt here when implementing dual numbers. But I didn’t find this particular one. (Edit – because it was written after my issue.)

1 Like