Using ForwardDiff on qr factorization

aross · July 6, 2020, 5:23pm

I am doing a linear least-squares optimization on a function by using QR factorization. I wish to find the gradient so that I can optimize a different parameter, but I am running into trouble when passing qr() to ForwardDiff.
I was wondering if there is some sort of QR factorization built to handle ForwardDiff.Dual numbers?

CC @cortner

dpsanders · July 6, 2020, 5:34pm

It works for me:


julia> using LinearAlgebra, ForwardDiff

julia> M = rand(2, 2)
2×2 Array{Float64,2}:
 0.433905   0.520042
 0.0231565  0.0335941

julia> M2 = ForwardDiff.Dual.(M)
2×2 Array{ForwardDiff.Dual{Nothing,Float64,0},2}:
 Dual{Nothing}(0.433905)   Dual{Nothing}(0.520042)
 Dual{Nothing}(0.0231565)  Dual{Nothing}(0.0335941)

julia> qr(M2)
QR{ForwardDiff.Dual{Nothing,Float64,0},Array{ForwardDiff.Dual{Nothing,Float64,0},2}}
Q factor:
2×2 LinearAlgebra.QRPackedQ{ForwardDiff.Dual{Nothing,Float64,0},Array{ForwardDiff.Dual{Nothing,Float64,0},2}}:
 Dual{Nothing}(-0.998579)   Dual{Nothing}(-0.0532919)
 Dual{Nothing}(-0.0532919)   Dual{Nothing}(0.998579)
R factor:
2×2 Array{ForwardDiff.Dual{Nothing,Float64,0},2}:
 Dual{Nothing}(-0.434522)  Dual{Nothing}(-0.521094)
  Dual{Nothing}(0.0)        Dual{Nothing}(0.0058323)

You are probably restricting the types by the way you have defined your function. Without more information it’s impossible to say.

aross · July 6, 2020, 6:34pm

Sorry, I didn’t specify the problem better. Thank you for the example, it was very helpful. I am still having an issue with the following code:

using LinearAlgebra, ForwardDiff

X_data = [ rand(5) for _ = 1:8 ];

function g(X,alpha)
    A = Array{Real,2}(undef, length(X_data), 10)
    for (ix, x) in enumerate(X)
        A[ix,:] = [ sin(alpha) for n = 1:10]
    end
    return(A)
end;

f = alpha -> g(X_data,alpha);

f_qr = alpha -> qr(g(X_data,alpha));

f(3)

f_qr(3)

ForwardDiff.derivative(f,3)

ForwardDiff.derivative(f_qr,3)

I get the error:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#9#10",Int64},Float64,1})

dpsanders · July 6, 2020, 7:39pm

The following seems to work:


function g(X, α::T) where {T}

    A = zeros(T, length(X), 10)

    for (ix, x) in enumerate(X)
        A[ix,:] = [sin(α) for n = 1:10]
    end
    return(A)
end

f_qr(ForwardDiff.Dual(3.0, 1.0))

It still doesn’t work with derivative. You would probably have to go in with a debugger to see exactly what was being called.

cortner · July 6, 2020, 8:46pm

Many thanks, David. It looks like the R factor can be differentiated without problems, not clear yet about the Q factor.

dpsanders · July 7, 2020, 1:31am

Why do you say that? Don’t you just need to extract the dual part of each entry?

cortner · July 7, 2020, 2:06am

What I meant was to compute it via the derivative function. I’m not sure what goes wrong with Q.

dpsanders · July 7, 2020, 2:59am

The problem turns out to be that ForwardDiff just doesn’t know what to do with the QR object that qr returns. The simple solution is just to extract the pieces; e.g. the following works to extract the derivative of just Q:

f_qr_Q(alpha) = qr(g(X_data, alpha)).Q

my_derivative(f_qr_Q, 3.0)

You can also call qr by hand on the dual numbers and then separately extract the derivatives.

cortner · July 7, 2020, 3:27am

you are right, we should just directly work with Dual numbers, then it seems ok.

dpsanders · July 7, 2020, 3:40am

Or you can just add a method for extract_derivatives that works correctly on a QR object. (Although it’s not necessarily obvious what type it should return.)

cortner · July 7, 2020, 4:08am

We are actually interested in getting the derivative of the solution of a LLSQ problem w.r.t. parameters (and then it is clear what the type should be), which is of course easy enough to do by hand if we can differentiate the system. We are trying to differentiate through the QR factorisation using ForwardDiff.jl just to compare different approaches - for fun, to learn, and to test.

ettersi · July 7, 2020, 6:45am

Hi Andres and Christoph

Here are three ways for computing the sensitivities dQ, dR in

QR + \varepsilon \, dQ\, R + \varepsilon \, Q \, dR + \mathcal{O}(\varepsilon^2) = A + \varepsilon \, dA.

using FiniteDifferences
function dqr_fdm(A,dA)
    @assert size(A) == size(dA)
    m,n = size(A)
    dQ,dR = jvp(
        fdm,
        A->begin
            Q,R = qr(A)
            return Q[:,1:n],R
        end,
        (A,dA)
    )
    return dQ,dR
end

using ForwardDiff
function dqr_dual(A,dA)
    @assert size(A) == size(dA)
    m,n = size(A)
    Q,R = qr(ForwardDiff.Dual.(A,dA))
    dQ = ForwardDiff.partials.(Matrix(Q),1)
    dR = ForwardDiff.partials.(R,1)
    return dQ,dR
end

function dqr_expl(A,dA)
    @assert size(A) == size(dA)
    m,n = size(A)
    Q,R = qr(A)
    Q = Matrix(Q)
    QdA = Q'*dA
    QdAR = QdA/R
    X = tril(QdAR,-1)
    X = X - X'
    dQ = Q*X + dA/R - Q*QdAR
    dR = QdA - X*R
    return dQ,dR
end

using Test
function test()
    m,n = 5,3
     A = rand(m,n)
    dA = rand(m,n)

    dQ_fdm,dR_fdm = dqr_fdm(A,dA)
    dQ_dual,dR_dual = dqr_dual(A,dA)
    dQ_expl,dR_expl = dqr_expl(A,dA)

    @test dQ_dual ≈ dQ_fdm
    @test dR_dual ≈ dR_fdm

    @test dQ_expl ≈ dQ_fdm
    @test dR_expl ≈ dR_fdm
end

function benchmark()
    m,n = 10000,500
     A = rand(m,n)
    dA = rand(m,n)

    println("Dual numbers:")
    @time dqr_dual(A,dA)
    println()

    println("Explicit derivative:")
    @time dqr_expl(A,dA)
    println()
end

qdr_expl is based on the formulae derived in here, and it significantly outperforms qdr_dual on large problems.

Dual numbers:
 12.101934 seconds (18 allocations: 349.053 MiB, 0.31% gc time)

Explicit derivative:
  2.076238 seconds (54 allocations: 360.773 MiB

The likely reason for this is that dqr_dual uses a Julia implementation of the QR factorisation, while dqr_expl can leverage BLAS and LAPACK.

The paper that I linked above also presents formulae for adjoints / reverse mode differentiation. These would allow you to compute gradients much more efficiently than forward differentiation.

Topic		Replies	Views
Cannot ForwardDiff through linear least squares via QR Specific Domains forwarddiff , qr , ad	6	360	September 27, 2023
ForwardDiff.jl and its Dual Type question General Usage package , forwarddiff	4	1061	August 17, 2020
Use QR factorization efficiently Performance linearalgebra	12	5164	January 27, 2018
QR decomposition with julia. How to? New to Julia question	10	2612	January 4, 2023
Efficient QR factorization Numerics linearalgebra	3	686	May 30, 2019

Using ForwardDiff on qr factorization

Related topics