# Splitting and Summing Sparse Arrays

In my application I need to split a sparse array into columns then pass each column to a separate function and sum them independently.

``````x = sprand(1000, 1000, 0.01)
cols = eachcol(x)

function foo(x)
s = sum(x)
# do more with s and x...
end

@assert foo.(cols) ≈ sum(x, dims=1)'

@btime foo.(\$cols)
``````

3.592 ms (3 allocations: 78.30 KiB)

However, I know that:

``````@btime sum(\$x, dims=1)
``````

3.427 μs (4 allocations: 8.08 KiB)

Basically I take a 1000x performance hit for splitting up my array this way. Is there a more efficient way to sum each column after splitting the sparse array?

Hey there!
It looks like `eachcol` is not specialized for sparse matrices, even though it could bring some performance benefits.
I see one quick fix, which would be to split the columns manually by exploiting the CSC storage (see the documentation). There must be a more elegant workaround though

This is a little better

``````using SparseArrays, BenchmarkTools
L = 1000
x = sprand(L, L, 0.01)

function foo(x::SparseVector)
s = sum(x)
# do more with s and x...
end

function run(x)
N = size(x,2)
A = Array{Float64, 1}(undef, N)
for i in 1:N
A[i] = foo(x[:,i])
end
return A
end

@btime run(x);
# 94.725 μs (2001 allocations: 279.31 KiB)``````

With the `nzrange` function you can write your own optimized version Sparse Arrays · The Julia Language
(I’m not sure through if that is applicable in the more complex setting you talked about…)

``````function foo_loops(x)
N = size(x, 2)
# rows = rowvals(x)   # stores the row indices, but we don't need them here
vals = nonzeros(x)

r = zeros(N)
@inbounds for j in 1:N
for i in nzrange(x, j)
r[j] += vals[i]
end
end
return r
end
foo_loops(x) ≈ sum(x,dims=1)'  # == true
@btime foo_loops(x)            # 9.800 μs (1 allocation: 7.94 KiB)
``````

EDIT: the first version I posted was wrong, since it was summing over the wrong dimension.

1 Like

Hi All!

Thanks for the awesome replies. Here’s what I’ve come up with that seems good enough - though not quite as fast as @SteffenPL’s `foo_loops`. It would be a little harder to implement with my structure.

``````function foo2(x)
s = sum(nonzeros(x))
# do more with s and x...
end

@assert foo2.(cols) ≈ sum(x, dims=1)'  # == true

@btime foo2.(\$cols);    # 8.750 μs (3 allocations: 78.30 KiB)
@btime foo.(\$cols);     # 3.592 ms (3 allocations: 78.30 KiB)
@btime foo_loops(\$x);   # 4.298 μs (1 allocation: 7.94 KiB)
@btime sum(\$x, dims=1); # 3.432 μs (4 allocations: 8.08 KiB)
``````

So it seems like it might not be `eachcol` that’s not optimized, but a missing sparse method for `sum`. I’ll probably just add something like this to my local `Bandaids.jl` package.

`Base.sum(x::SubArray{T, 1, <:AbstractSparseArray}) where T = sum(nonzeros(x))`

2 Likes

FWIW, this seems to be slightly faster:

``````julia> @btime foo_loops(\$x);
6.331 μs (1 allocation: 7.94 KiB)

julia> function foo(x)
s = Vector{eltype(x)}(undef, size(x,2))
for (icol, col) in pairs(eachcol(x))
s[icol] = sum(nonzeros(col))
end
return s
end
foo (generic function with 1 method)

julia> @btime foo(\$x);
5.203 μs (1 allocation: 7.94 KiB)

``````
1 Like