julia> A = rand(1000,1000); A = A + A'; A = sparse(A);
julia> e = ones(1000);
julia> @time A * e
0.054203 seconds (17.26 k allocations: 990.259 KiB)
julia> @time A * e;
0.001510 seconds (5 allocations: 8.094 KiB)
julia> @time A * e;
0.001486 seconds (5 allocations: 8.094 KiB)
julia> B = Symmetric(A);
julia> @time B * e;
0.220572 seconds (70.48 k allocations: 3.445 MiB)
julia> @time B * e;
0.062014 seconds (5 allocations: 8.094 KiB)
julia> @time B * e;
0.063677 seconds (5 allocations: 8.094 KiB)
Why is the product with B so much slower?
With the fairly large symmetric sparse matrix https://www.cise.ufl.edu/research/sparse/MM/Schenk_AFE/af_shell8.tar.gz, MatrixDepot downloads a Symmetric{...} matrix and a matrix-vector product never returns (I’m using Julia 0.5.2). By contrast, downloading the file directly and using MatrixMarket, the matrix-vector product is almost instantaneous (the matrix isn’t cast as Symmetric{...}).
I’m not able to test Julia 0.6 because loading the matrix using MatrixMarket never returns (see https://github.com/JuliaSparse/MatrixMarket.jl/issues/27). The same happens with MatrixDepot, which presumably relies on MatrixMarket.jl.
Secondly, I don’t understand why you are wrapping a dense matrix in a sparse wrapper, that will be extremely inefficient. Only use sparse if your matrix is really, very sparse.
Thirdly, I wouldn’t expect any gains from declaring A as symmetric like you are doing, since there is no way to exploit the symmetry in a simple matrix-vector product, and the special indexing will have some extra cost. It might also hinder calls to BLAS (not certain about this.)
As you can see, using an ordinary dense matrix is the fastest, a symmetric matrix carries some overhead, while a sparse product is much slower, because you matrix isn’t sparse at all.
Global scope does not matter here, the overhead is negligible. Symmetry does help MV products (I would assume because this is memory-bound, and symmetry reduces memory access by a factor of two). It’s completely true that sparse is a bad idea here, but it is indicative of performance problems for more reasonable setups, e.g. there’s a factor of more than 1e4 here:
using BenchmarkTools
A = sprandn(2000,2000,2e-3); A = A + A';
e = ones(2000);
@btime $A*$e;
B = Symmetric(A);
@btime $B * $e;
A=rand(1000,1000); A=A+A'; B=Symmetric(A); x=rand(1000);
for nthreads in [1,2,4,8]
BLAS.set_num_threads(nthreads)
println("nthreads=$nthreads")
@btime $A*$x; @btime $B*$x;
end
This is on a machine with four CPU cores, plus hyperthreading, and two memory channels. Since matrix*vector is mostly memory-bound (each matrix elements is read just once), I can understand if the symmetric case is faster, as less memory needs to be accessed. @DNF, what is your machine’s architecture?
Yes, benchmarking in global scope will be slower but that is not the issue here. If the function you are calling is big enough, the overhead is negligible.
Timings are slower, but the actual issue is that there is no specialized method for symmetric sparse matrix - dense vector multiplication.
Also, the Symmetric type does not currently save memory, since it wraps a full matrix (whether sparse or dense). Ideally, there could be a specialized SparseSymmetric type that would only store the required elements.
I agree with this. But my experience is that timing in global scope does not slow down everything by a fixed or proportional amount, but rather that some calls are dramatically affected, and others less so; leading to an unreliable impression not just of absolute performance, but also of relative performance.
@DNF: I get the same timing whether I @btime $A*$b or @time it.
As to symmetric matrix multiplication, symmetric matrices are just stored as full square matrices, with a flag telling BLAS to only look at the upper or lower triangle (as you can check by accessing B.data and B.uplo). As a result, BLAS only needs to loop over half the entries of the matrix. Since the kernel is memory bound (most of the time is spent loading values of A from memory), this should be a speedup. It’s very strange that you get worse results with symmetric matrices, and should probably be reported as a bug in julia or OpenBLAS.
One simple implementation that confers a speedup is (warning–not really tested):
LinAlg.A_mul_B!(y::AbstractVector, A::Symmetric, x::AbstractVector) = At_mul_B!(y, A.data, x)
```
This seems to be faster than `A * x`, since `At_mul_B!` is faster than `A_mul_B!`, and valid, since `A' == A`.
**Edit:** I mean to say it is faster than `A * x` for `A` sparse but not symmetric, and thus the symmetry speeds up the product.