In my application I need to split a sparse array into columns then pass each column to a separate function and sum them independently.

x = sprand(1000, 1000, 0.01)
cols = eachcol(x)
function foo(x)
s = sum(x)
# do more with s and x...
end
@assert foo.(cols) ≈ sum(x, dims=1)'
@btime foo.($cols)

3.592 ms (3 allocations: 78.30 KiB)

However, I know that:

@btime sum($x, dims=1)

3.427 μs (4 allocations: 8.08 KiB)

Basically I take a 1000x performance hit for splitting up my array this way. Is there a more efficient way to sum each column after splitting the sparse array?

Hey there!
It looks like eachcol is not specialized for sparse matrices, even though it could bring some performance benefits.
I see one quick fix, which would be to split the columns manually by exploiting the CSC storage (see the documentation). There must be a more elegant workaround though

using SparseArrays, BenchmarkTools
L = 1000
x = sprand(L, L, 0.01)
function foo(x::SparseVector)
s = sum(x)
# do more with s and x...
end
function run(x)
N = size(x,2)
A = Array{Float64, 1}(undef, N)
for i in 1:N
A[i] = foo(x[:,i])
end
return A
end
@btime run(x);
# 94.725 μs (2001 allocations: 279.31 KiB)

With the nzrange function you can write your own optimized version Sparse Arrays · The Julia Language
(I’m not sure through if that is applicable in the more complex setting you talked about…)

function foo_loops(x)
N = size(x, 2)
# rows = rowvals(x) # stores the row indices, but we don't need them here
vals = nonzeros(x)
r = zeros(N)
@inbounds for j in 1:N
for i in nzrange(x, j)
r[j] += vals[i]
end
end
return r
end
foo_loops(x) ≈ sum(x,dims=1)' # == true
@btime foo_loops(x) # 9.800 μs (1 allocation: 7.94 KiB)

EDIT: the first version I posted was wrong, since it was summing over the wrong dimension.

Thanks for the awesome replies. Here’s what I’ve come up with that seems good enough - though not quite as fast as @SteffenPL’s foo_loops. It would be a little harder to implement with my structure.

function foo2(x)
s = sum(nonzeros(x))
# do more with s and x...
end
@assert foo2.(cols) ≈ sum(x, dims=1)' # == true
@btime foo2.($cols); # 8.750 μs (3 allocations: 78.30 KiB)
@btime foo.($cols); # 3.592 ms (3 allocations: 78.30 KiB)
@btime foo_loops($x); # 4.298 μs (1 allocation: 7.94 KiB)
@btime sum($x, dims=1); # 3.432 μs (4 allocations: 8.08 KiB)

So it seems like it might not be eachcol that’s not optimized, but a missing sparse method for sum. I’ll probably just add something like this to my local Bandaids.jl package.

Base.sum(x::SubArray{T, 1, <:AbstractSparseArray}) where T = sum(nonzeros(x))

julia> @btime foo_loops($x);
6.331 μs (1 allocation: 7.94 KiB)
julia> function foo(x)
s = Vector{eltype(x)}(undef, size(x,2))
for (icol, col) in pairs(eachcol(x))
s[icol] = sum(nonzeros(col))
end
return s
end
foo (generic function with 1 method)
julia> @btime foo($x);
5.203 μs (1 allocation: 7.94 KiB)