Hi Andres and Christoph
Here are three ways for computing the sensitivities dQ, dR in
QR + \varepsilon \, dQ\, R + \varepsilon \, Q \, dR + \mathcal{O}(\varepsilon^2) = A + \varepsilon \, dA.
using FiniteDifferences
function dqr_fdm(A,dA)
@assert size(A) == size(dA)
m,n = size(A)
dQ,dR = jvp(
fdm,
A->begin
Q,R = qr(A)
return Q[:,1:n],R
end,
(A,dA)
)
return dQ,dR
end
using ForwardDiff
function dqr_dual(A,dA)
@assert size(A) == size(dA)
m,n = size(A)
Q,R = qr(ForwardDiff.Dual.(A,dA))
dQ = ForwardDiff.partials.(Matrix(Q),1)
dR = ForwardDiff.partials.(R,1)
return dQ,dR
end
function dqr_expl(A,dA)
@assert size(A) == size(dA)
m,n = size(A)
Q,R = qr(A)
Q = Matrix(Q)
QdA = Q'*dA
QdAR = QdA/R
X = tril(QdAR,-1)
X = X - X'
dQ = Q*X + dA/R - Q*QdAR
dR = QdA - X*R
return dQ,dR
end
using Test
function test()
m,n = 5,3
A = rand(m,n)
dA = rand(m,n)
dQ_fdm,dR_fdm = dqr_fdm(A,dA)
dQ_dual,dR_dual = dqr_dual(A,dA)
dQ_expl,dR_expl = dqr_expl(A,dA)
@test dQ_dual ≈ dQ_fdm
@test dR_dual ≈ dR_fdm
@test dQ_expl ≈ dQ_fdm
@test dR_expl ≈ dR_fdm
end
function benchmark()
m,n = 10000,500
A = rand(m,n)
dA = rand(m,n)
println("Dual numbers:")
@time dqr_dual(A,dA)
println()
println("Explicit derivative:")
@time dqr_expl(A,dA)
println()
end
qdr_expl
is based on the formulae derived in here, and it significantly outperforms qdr_dual
on large problems.
Dual numbers:
12.101934 seconds (18 allocations: 349.053 MiB, 0.31% gc time)
Explicit derivative:
2.076238 seconds (54 allocations: 360.773 MiB
The likely reason for this is that dqr_dual
uses a Julia implementation of the QR factorisation, while dqr_expl
can leverage BLAS and LAPACK.
The paper that I linked above also presents formulae for adjoints / reverse mode differentiation. These would allow you to compute gradients much more efficiently than forward differentiation.