It looks to me like lu on symmetric matrices actually do extra work, because they assume that the wrapped array is not symmetric and therefore need to first copy every element across the diagonal. And at the same time do not benefit from knowing about symmetry beyond that.
Surprisingly, symmetric static arrays do not work at all:
jl> using LinearAlgebra, StaticArrays
jl> B = SA[1.5 2 -4; 2 -1 -3; -4 -3 5];
jl> sB = Symmetric(B);
jl> x = SA[1, 2, 3];
jl> sB \ x
ERROR: setindex!(::SMatrix{3, 3, Float64, 9}, value, ::Int) is not defined.
Unfortunately, it seems like \ will generally fail for immutable matrices, unless they have a special implementation. So \ works for SMatrix, but not for Symmetric{.., SMatrix}, etc.
It is hard to tell anything about the speed of solving a linear system from a 3 by 3 example - the overhead of checking the system and dispatching is probably greater than the amount of time spent doing the calculations.
Also, the LU decomposition and solution has been optimized and tuned over many, many years. I think it was the Lapack benchmark for supercomputers so it was worthwhile wringing every ounce of speed out of it. Methods for symmetric systems like the Bunch-Kaufman decomposition or the Cholesky factorization have not received the same level of intensive optimization.
It is even true for matrix multiplication. The function BLAS.gemm! performs an in-place general matrix multiplication whereas BLAS.syrk! takes advantage of symmetry in evaluation of a product like A'A and performs roughly half the number of operations.
julia> const A = rand(1000, 20);
julia> const C = similar(A, (20, 20));
julia> using BenchmarkTools, LinearAlgebra
julia> @btime BLAS.gemm!('T', 'N', 1.0, A, A, 0.0, C);
7.541 μs (0 allocations: 0 bytes)
julia> Ccp = copy(C);
julia> @btime BLAS.syrk!('U', 'T', 1.0, A, 0.0, C);
7.579 μs (0 allocations: 0 bytes)
julia> Ccp ≈ C
true
So doing half the number of operations ends up being a little bit slower, probably because gemm has been optimized more than syrk has.
Yes, but my impression from perusing the code was that backslash actually doesn’t take advantage of the symmetry, but does extra work to copy elements across the diagonal. So it’s actually strictly slower.
This assumes that I found the right method definitions, though…
f1 () = begin
t = rand(1000, 1000)
t += 100I
b = ones(1000)
end
@btime f1();
# 1.111 ms (4 allocations: 15.26 MiB)
f2 () = begin
t = rand(1000, 1000)
t += 100I
b = ones(1000)
LAPACK.posv!('L', t, b)
end
@btime f2();
# 4.723 ms (5 allocations: 15.27 MiB)
t = rand(1000, 1000)
@btime t \ ones(1000)
# 6.188 ms (5 allocations: 7.65 MiB)