I want to compute the diagonal elements of
- where V is an orthogonal matrix
- A is a symmetric matrix
I noticed that it is much faster to just compute X and then extract the diagonal elements than computing only the diagonal elements directly.
using LinearAlgebra, SparseArrays, Random, BenchmarkTools
rng = Random.MersenneTwister(23163213)
# my current approach
function diagonal_elements!(diag_elem::Vector{T}, A::AbstractMatrix{T}, V::AbstractMatrix{T}, temp::Vector{T}) where {T <: AbstractFloat}
n = size(A, 2)
@inbounds for i = 1:n
vi = view(V, :, i)
mul!(temp, A, vi)
diag_elem[i] = dot(vi, temp)
end
return nothing
end
# naive (but faster?) way
function diagonal_elements2!(diag_elem::Vector{T}, A::AbstractMatrix{T}, V::AbstractMatrix{T}) where {T <: AbstractFloat}
Λ = V' * A * V
@inbounds @simd for i in eachindex(diag_elem)
diag_elem[i] = Λ[i, i]
end
return nothing
end
# make symmetric matrix
n = 2000
A = rand(rng, n, n)
A = 0.5 * (A + A')
# A = Symmetric(A) # seems to slow things down
# make an orthogonal matrix
V = eigen(Symmetric(rand(rng, n, n))).vectors
# preallocate
diag_elem = zeros(n)
diag_elem2 = zeros(n)
# compute diagonal of V' * A * V where A is symmetric and V orthogonal
I get the following benchmark results:
julia> @benchmark diagonal_elements!($diag_elem, $A, $V, $zeros(n))
BenchmarkTools.Trial:
memory estimate: 70.27 KiB
allocs estimate: 3490
--------------
minimum time: 1.328 s (0.00% GC)
median time: 1.346 s (0.00% GC)
mean time: 1.345 s (0.00% GC)
maximum time: 1.360 s (0.00% GC)
--------------
samples: 4
evals/sample: 1
julia> @benchmark diagonal_elements2!($diag_elem2, $A, $V)
BenchmarkTools.Trial:
memory estimate: 61.04 MiB
allocs estimate: 4
--------------
minimum time: 221.781 ms (0.00% GC)
median time: 240.748 ms (2.60% GC)
mean time: 239.897 ms (2.42% GC)
maximum time: 257.615 ms (5.49% GC)
--------------
samples: 21
evals/sample: 1
Sure, the naive way is using optimized BLAS calls, but I would have expected diagonal_elements!
to still be faster, as we are only computing a fraction of the elements of the matrix product.
Any ideas / suggestions on how to speed up the first function?