Strange performance with Adjoint structures

julia> a = sprand(Int16, 10000, 10000, 0.2);

julia> y = a';

julia> @time y[3,:];
  0.001126 seconds (28 allocations: 41.156 KiB)

julia> @time y.parent[:,3];
  0.000013 seconds (7 allocations: 19.625 KiB)

julia> y[3,:] == y.parent[:,3]
true

(precompilation elided.)

I would have thought that the row-major Adjoint would have the same performance characteristics as the column-major SparseMatrixCSC for the same data. Is what’s actually happening expected?

julia> a = sprand(Int16, 10000, 10000, 0.2);
julia> y = a';
julia> typeof(y)
LinearAlgebra.Adjoint{Int16,SparseMatrixCSC{Int16,Int64}}
julia> @btime $a[:,3];
  1.680 μs (4 allocations: 20.42 KiB)
julia> @btime $y[3,:];
  435.011 μs (25 allocations: 81.11 KiB)
julia> @which a[:,3]
getindex(x::SparseMatrixCSC, ::Colon, j::Integer) in SparseArrays at /home/lapeyre/julia/julia/usr/share/julia/stdlib/v0.7/SparseArrays/src/sparsevector.jl:514
julia> @which y[3,:]
getindex(A::AbstractArray, I...) in Base at abstractarray.jl:911

julia> @btime $a[3,:];
  329.813 μs (23 allocations: 40.98 KiB)

julia> @which a[3,:]
getindex(A::SparseMatrixCSC, i::Integer, ::Colon) in SparseArrays at /home/lapeyre/julia/julia/usr/share/julia/stdlib/v0.7/SparseArrays/src/sparsevector.jl:538

It looks like indexing a both ways is done in sparsevector.jl. Indexing y is done in abstractarray.jl. If you follow the chain of calls in the latter case, you will see that code that knows about sparse arrays is never called. I don’t think there is anything in stdlib that operates on sparse matrices in CSR form; even temporarily.

I should add: in the following yc is a SparseMatrixCSC. Indexing is now handled in the SparseArrays library. But, it is not much more efficient.

julia> yc = copy(y);

julia> @btime $yc[3,:];
  380.025 μs (25 allocations: 81.11 KiB)

Lazy Adjoint wrappers are often good because we can write e.g. A * B' and have it dispatch to an optimized function without materializing the adjoint. But if there isn’t an optimized version available, this will fall back to AbstractArray version which is unusably slow for anything except toy sparse arrays.

Extracting rows from CSC is a slow process (requires a binary search for each column).

1 Like

Yes. Using a Lazy Adjoint wrapper is far more efficient in some cases. And, at least in this example, when there is no efficient function, the penalty for the wrapper does not make much difference.

But, what I wrote above can’t be correct. y[3,:] has to use sparse indexing at some point. There is no other way to get any element.

y[1,:] calls y[i,j] with i and j integers, for all columns. This could be faster:

julia> using BenchmarkTools;
julia> using SparseArrays;
julia> using LinearAlgebra;
julia> const a = sprand(10^4, 10^4, 0.01);
julia> const y = a';
julia> a[:, 1] == y[1, :]
true

julia> @btime a[:, 1];
  235.828 ns (3 allocations: 1.78 KiB)

julia> @btime y[1, :];
  197.371 μs (15 allocations: 4.56 KiB)

julia> mygetindex(S::LinearAlgebra.Adjoint{Tv, SparseMatrixCSC{Tv,Ti}}, I1, I2) where {Tv, Ti} =
           getindex(S.parent, I2, I1);

julia> y[1, :] == mygetindex(y, 1, :)
true

julia> @btime mygetindex(y, 1, :);
  218.029 ns (3 allocations: 1.78 KiB)

I don’t know how often indexing like this is needed.

The SparseArrays test suite passes with a cleaned up version of the previous post, with mygetindex replaced by getindex. I’ll try a PR.

4 Likes

https://github.com/JuliaLang/julia/pull/28654

2 Likes