I took the liberty of optimizing your Julia implementation a bit. I removed all of the slicing allocations by using @view
and used mul!
for in-place multiplication. Neither of these changed the runtime significantly at these large problem sizes, but they’ll reduce the memory churn which is never a bad thing. More significantly, I also used @view
to trim computations on parts of your arrays that were still all-zero for a 2x speedup.
I checked that this still gave virtually the same results, but you should check it too.
using LinearAlgebra: norm, mul!
function CGS2_alt(X::AbstractMatrix)
Base.require_one_based_indexing(X) # ensure X has 1-based indices
T = float(eltype(X))
n, m = size(X)
Q = zeros(T, n, m)
R = zeros(T, m, m)
r = zeros(T, m)
w = zeros(T, n)
R[1, 1] = norm(@view X[:, 1])
Q[:, 1] .= @view(X[:, 1]) ./ R[1, 1]
for i in 2:m
iprev = 1:i-1
w .= @view X[:, i]
Qi = @view Q[:,iprev]
ri = @view r[iprev]
mul!(ri, Qi', w) # ri .= (w'Qi)'
R[iprev, i] = ri
mul!(w, Qi, ri, -1, 1) # w .-= Qi * ri
mul!(ri, Qi', w) # ri .= (w'Qi)'
mul!(w, Qi, ri, -1, 1) # w .-= Qi * ri
R[i, i] = norm(w)
Q[:, i] .= w ./ R[i, i]
end
return Q, R
end