Using ForwardDiff on qr factorization

I am doing a linear least-squares optimization on a function by using QR factorization. I wish to find the gradient so that I can optimize a different parameter, but I am running into trouble when passing qr() to ForwardDiff.
I was wondering if there is some sort of QR factorization built to handle ForwardDiff.Dual numbers?

CC @cortner

1 Like

It works for me:

julia> using LinearAlgebra, ForwardDiff

julia> M = rand(2, 2)
2×2 Array{Float64,2}:
 0.433905   0.520042
 0.0231565  0.0335941

julia> M2 = ForwardDiff.Dual.(M)
2×2 Array{ForwardDiff.Dual{Nothing,Float64,0},2}:
 Dual{Nothing}(0.433905)   Dual{Nothing}(0.520042)
 Dual{Nothing}(0.0231565)  Dual{Nothing}(0.0335941)

julia> qr(M2)
Q factor:
2×2 LinearAlgebra.QRPackedQ{ForwardDiff.Dual{Nothing,Float64,0},Array{ForwardDiff.Dual{Nothing,Float64,0},2}}:
 Dual{Nothing}(-0.998579)   Dual{Nothing}(-0.0532919)
 Dual{Nothing}(-0.0532919)   Dual{Nothing}(0.998579)
R factor:
2×2 Array{ForwardDiff.Dual{Nothing,Float64,0},2}:
 Dual{Nothing}(-0.434522)  Dual{Nothing}(-0.521094)
  Dual{Nothing}(0.0)        Dual{Nothing}(0.0058323)

You are probably restricting the types by the way you have defined your function. Without more information it’s impossible to say.


Sorry, I didn’t specify the problem better. Thank you for the example, it was very helpful. I am still having an issue with the following code:

using LinearAlgebra, ForwardDiff
X_data = [ rand(5) for _ = 1:8 ];
function g(X,alpha)
    A = Array{Real,2}(undef, length(X_data), 10)
    for (ix, x) in enumerate(X)
        A[ix,:] = [ sin(alpha) for n = 1:10]
f = alpha -> g(X_data,alpha);
f_qr = alpha -> qr(g(X_data,alpha));

I get the error:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#9#10",Int64},Float64,1})

The following seems to work:

function g(X, α::T) where {T}

    A = zeros(T, length(X), 10)

    for (ix, x) in enumerate(X)
        A[ix,:] = [sin(α) for n = 1:10]

f_qr(ForwardDiff.Dual(3.0, 1.0))

It still doesn’t work with derivative. You would probably have to go in with a debugger to see exactly what was being called.

Many thanks, David. It looks like the R factor can be differentiated without problems, not clear yet about the Q factor.

Why do you say that? Don’t you just need to extract the dual part of each entry?

What I meant was to compute it via the derivative function. I’m not sure what goes wrong with Q.

The problem turns out to be that ForwardDiff just doesn’t know what to do with the QR object that qr returns. The simple solution is just to extract the pieces; e.g. the following works to extract the derivative of just Q:

f_qr_Q(alpha) = qr(g(X_data, alpha)).Q

my_derivative(f_qr_Q, 3.0)

You can also call qr by hand on the dual numbers and then separately extract the derivatives.

1 Like

you are right, we should just directly work with Dual numbers, then it seems ok.


Or you can just add a method for extract_derivatives that works correctly on a QR object. (Although it’s not necessarily obvious what type it should return.)

We are actually interested in getting the derivative of the solution of a LLSQ problem w.r.t. parameters (and then it is clear what the type should be), which is of course easy enough to do by hand if we can differentiate the system. We are trying to differentiate through the QR factorisation using ForwardDiff.jl just to compare different approaches - for fun, to learn, and to test.

1 Like

Hi Andres and Christoph :slight_smile:

Here are three ways for computing the sensitivities dQ, dR in

QR + \varepsilon \, dQ\, R + \varepsilon \, Q \, dR + \mathcal{O}(\varepsilon^2) = A + \varepsilon \, dA.
using FiniteDifferences
function dqr_fdm(A,dA)
    @assert size(A) == size(dA)
    m,n = size(A)
    dQ,dR = jvp(
            Q,R = qr(A)
            return Q[:,1:n],R
    return dQ,dR

using ForwardDiff
function dqr_dual(A,dA)
    @assert size(A) == size(dA)
    m,n = size(A)
    Q,R = qr(ForwardDiff.Dual.(A,dA))
    dQ = ForwardDiff.partials.(Matrix(Q),1)
    dR = ForwardDiff.partials.(R,1)
    return dQ,dR

function dqr_expl(A,dA)
    @assert size(A) == size(dA)
    m,n = size(A)
    Q,R = qr(A)
    Q = Matrix(Q)
    QdA = Q'*dA
    QdAR = QdA/R
    X = tril(QdAR,-1)
    X = X - X'
    dQ = Q*X + dA/R - Q*QdAR
    dR = QdA - X*R
    return dQ,dR

using Test
function test()
    m,n = 5,3
     A = rand(m,n)
    dA = rand(m,n)

    dQ_fdm,dR_fdm = dqr_fdm(A,dA)
    dQ_dual,dR_dual = dqr_dual(A,dA)
    dQ_expl,dR_expl = dqr_expl(A,dA)

    @test dQ_dual ≈ dQ_fdm
    @test dR_dual ≈ dR_fdm

    @test dQ_expl ≈ dQ_fdm
    @test dR_expl ≈ dR_fdm

function benchmark()
    m,n = 10000,500
     A = rand(m,n)
    dA = rand(m,n)

    println("Dual numbers:")
    @time dqr_dual(A,dA)

    println("Explicit derivative:")
    @time dqr_expl(A,dA)

qdr_expl is based on the formulae derived in here, and it significantly outperforms qdr_dual on large problems.

Dual numbers:
 12.101934 seconds (18 allocations: 349.053 MiB, 0.31% gc time)

Explicit derivative:
  2.076238 seconds (54 allocations: 360.773 MiB

The likely reason for this is that dqr_dual uses a Julia implementation of the QR factorisation, while dqr_expl can leverage BLAS and LAPACK.

The paper that I linked above also presents formulae for adjoints / reverse mode differentiation. These would allow you to compute gradients much more efficiently than forward differentiation.

1 Like