Operation on transpose sparse array is slow

Hi Friends, I am searching for a better solution for speed up my script, my script accept a sparse matrix as input, (some time the sparse matrix is sample X features, and sometimes it is features X sample), I have to transpose it to make sure sample is always on column.

But I found that some operation on transpose sparse matrix will be much slow, so I wonder could it have a better solution?

X = read_mtx("file path")
# 232320×32840 SparseMatrixCSC{Float32, Int64}
# sample x feature

idx_1 = rand(Bool, 232320) # condition select by sample
idx_2 = rand(Bool, 32840) # condition select by features

@time mean(X[idx_1,idx_2])
# 3.291982 seconds (18 allocations: 1.532 GiB, 3.49% gc time)

But if I index on the transpose it will much slow, actually I wait a looong time but it is still running…

Y = X' 
# 32840×232320 LinearAlgebra.Adjoint{Float32, SparseMatrixCSC{Float32, Int64}}
# now it is  feature x sample

@time mean(Y[idx_2,idx_1])
# looooooooog time waiting and no return...

of course I can use view:

@time @views mean(X[idx_1,idx_2]);
# 30.983588 seconds (10 allocations: 1.264 MiB)

@time  @views mean(Y[idx_2,idx_1]) 
# looooooooog time waiting and no return...

I have no idea how to improve it, and I would greatly appreciate any advice!
Thank you!!!

Transposing a sparse matrix with Y = X' doesn’t actually copy any data, it wraps it with a special Adjoint type that gives a (conjugate-)transposed view of the underlying X matrix.

Adjoint views of sparse matrices have several optimized methods, e.g. Y * vector is fast, but not all of the optimized methods for sparse matrices are implemented for the Adjoint wrapper. In particular, “logical indexing” with a vector of booleans has a fast implementation for sparse matrices, but not for Adjoint wrappers thereof, and in consequence the corresponding operation on Y falls back to the generic method for dense matrices, which is very slow because it does not exploit sparsity.

Three possible workarounds:

  1. Set Y = SparseMatrixCSC(X') — this makes an explicit sparse-matrix copy of the Adjoint wrapper, which uses up some extra memory but then subsequent operations will be fast.
  2. Re-organize your code so that you do your calculations directly on the un-transposed matrix X
  3. Define the missing specialized methods for Adjoint wrappers of sparse matrices. (This is “type piracy” but may be a reasonable stopgap measure in the short run.)

For example, your code above should become fast again if we define:

using LinearAlgebra, SparseArrays
Base.getindex(A::Adjoint{<:Any,<:SparseArrays.AbstractSparseMatrixCSC}, I::AbstractVector{Bool}, J::AbstractVector{Bool}) = parent(A)[findall(J),findall(I)]'

If someone wants to volunteer to make a PR to SparseArrays.jl with this and similar enhancements, that would be great. See also getindex is very slow for adjoints of sparse arrays · Issue #36 · JuliaSparse/SparseArrays.jl · GitHub and Strange performance with Adjoint structures