Strange performance with Adjoint structures

anon94023334 · August 13, 2018, 5:56pm

julia> a = sprand(Int16, 10000, 10000, 0.2);

julia> y = a';

julia> @time y[3,:];
  0.001126 seconds (28 allocations: 41.156 KiB)

julia> @time y.parent[:,3];
  0.000013 seconds (7 allocations: 19.625 KiB)

julia> y[3,:] == y.parent[:,3]
true

(precompilation elided.)

I would have thought that the row-major Adjoint would have the same performance characteristics as the column-major SparseMatrixCSC for the same data. Is what’s actually happening expected?

jlapeyre · August 13, 2018, 9:00pm

julia> a = sprand(Int16, 10000, 10000, 0.2);
julia> y = a';
julia> typeof(y)
LinearAlgebra.Adjoint{Int16,SparseMatrixCSC{Int16,Int64}}
julia> @btime $a[:,3];
  1.680 μs (4 allocations: 20.42 KiB)
julia> @btime $y[3,:];
  435.011 μs (25 allocations: 81.11 KiB)
julia> @which a[:,3]
getindex(x::SparseMatrixCSC, ::Colon, j::Integer) in SparseArrays at /home/lapeyre/julia/julia/usr/share/julia/stdlib/v0.7/SparseArrays/src/sparsevector.jl:514
julia> @which y[3,:]
getindex(A::AbstractArray, I...) in Base at abstractarray.jl:911

julia> @btime $a[3,:];
  329.813 μs (23 allocations: 40.98 KiB)

julia> @which a[3,:]
getindex(A::SparseMatrixCSC, i::Integer, ::Colon) in SparseArrays at /home/lapeyre/julia/julia/usr/share/julia/stdlib/v0.7/SparseArrays/src/sparsevector.jl:538

It looks like indexing a both ways is done in sparsevector.jl. Indexing y is done in abstractarray.jl. If you follow the chain of calls in the latter case, you will see that code that knows about sparse arrays is never called. I don’t think there is anything in stdlib that operates on sparse matrices in CSR form; even temporarily.

jlapeyre · August 13, 2018, 9:05pm

I should add: in the following yc is a SparseMatrixCSC. Indexing is now handled in the SparseArrays library. But, it is not much more efficient.

julia> yc = copy(y);

julia> @btime $yc[3,:];
  380.025 μs (25 allocations: 81.11 KiB)

kristoffer.carlsson · August 13, 2018, 9:06pm

Lazy Adjoint wrappers are often good because we can write e.g. A * B' and have it dispatch to an optimized function without materializing the adjoint. But if there isn’t an optimized version available, this will fall back to AbstractArray version which is unusably slow for anything except toy sparse arrays.

Extracting rows from CSC is a slow process (requires a binary search for each column).

jlapeyre · August 13, 2018, 9:14pm

Yes. Using a Lazy Adjoint wrapper is far more efficient in some cases. And, at least in this example, when there is no efficient function, the penalty for the wrapper does not make much difference.

But, what I wrote above can’t be correct. y[3,:] has to use sparse indexing at some point. There is no other way to get any element.

jlapeyre · August 13, 2018, 10:58pm

y[1,:] calls y[i,j] with i and j integers, for all columns. This could be faster:

julia> using BenchmarkTools;
julia> using SparseArrays;
julia> using LinearAlgebra;
julia> const a = sprand(10^4, 10^4, 0.01);
julia> const y = a';
julia> a[:, 1] == y[1, :]
true

julia> @btime a[:, 1];
  235.828 ns (3 allocations: 1.78 KiB)

julia> @btime y[1, :];
  197.371 μs (15 allocations: 4.56 KiB)

julia> mygetindex(S::LinearAlgebra.Adjoint{Tv, SparseMatrixCSC{Tv,Ti}}, I1, I2) where {Tv, Ti} =
           getindex(S.parent, I2, I1);

julia> y[1, :] == mygetindex(y, 1, :)
true

julia> @btime mygetindex(y, 1, :);
  218.029 ns (3 allocations: 1.78 KiB)

I don’t know how often indexing like this is needed.

jlapeyre · August 14, 2018, 1:51am

The SparseArrays test suite passes with a cleaned up version of the previous post, with mygetindex replaced by getindex. I’ll try a PR.

jlapeyre · August 14, 2018, 6:12pm

https://github.com/JuliaLang/julia/pull/28654

Topic		Replies	Views
Performance of lazy wrappers applied to sparse matrices Internals & Design linearalgebra	15	1439	October 1, 2018
Performance discrepancy in sparse matrix product Performance	4	973	June 29, 2018
Very long time for addition of complex identity matrix to transposed sparse matrix Performance question , sparsearrays	10	170	May 31, 2025
Asymmetric speed of in-place `sparse*dense` matrix product General Usage	7	1525	November 8, 2018
The first for loop execution Performance	6	749	August 23, 2019

Strange performance with Adjoint structures

Related topics