Splitting and Summing Sparse Arrays

In my application I need to split a sparse array into columns then pass each column to a separate function and sum them independently.

x = sprand(1000, 1000, 0.01)
cols = eachcol(x)

function foo(x)
    s = sum(x)
    # do more with s and x...

@assert foo.(cols) ≈ sum(x, dims=1)'

@btime foo.($cols)

3.592 ms (3 allocations: 78.30 KiB)

However, I know that:

@btime sum($x, dims=1)

3.427 μs (4 allocations: 8.08 KiB)

Basically I take a 1000x performance hit for splitting up my array this way. Is there a more efficient way to sum each column after splitting the sparse array?

Hey there!
It looks like eachcol is not specialized for sparse matrices, even though it could bring some performance benefits.
I see one quick fix, which would be to split the columns manually by exploiting the CSC storage (see the documentation). There must be a more elegant workaround though

This is a little better

using SparseArrays, BenchmarkTools
L = 1000
x = sprand(L, L, 0.01)

function foo(x::SparseVector)
    s = sum(x)
    # do more with s and x...

function run(x)
  N = size(x,2)
  A = Array{Float64, 1}(undef, N)
  for i in 1:N
      A[i] = foo(x[:,i])
  return A

@btime run(x);
# 94.725 μs (2001 allocations: 279.31 KiB)

With the nzrange function you can write your own optimized version Sparse Arrays · The Julia Language
(I’m not sure through if that is applicable in the more complex setting you talked about…)

function foo_loops(x)
    N = size(x, 2)
    # rows = rowvals(x)   # stores the row indices, but we don't need them here
    vals = nonzeros(x)

    r = zeros(N)
    @inbounds for j in 1:N 
        for i in nzrange(x, j)
            r[j] += vals[i]
    return r
foo_loops(x) ≈ sum(x,dims=1)'  # == true
@btime foo_loops(x)            # 9.800 μs (1 allocation: 7.94 KiB)

EDIT: the first version I posted was wrong, since it was summing over the wrong dimension.

1 Like

Hi All!

Thanks for the awesome replies. Here’s what I’ve come up with that seems good enough - though not quite as fast as @SteffenPL’s foo_loops. It would be a little harder to implement with my structure.

function foo2(x)
    s = sum(nonzeros(x))
    # do more with s and x...

@assert foo2.(cols) ≈ sum(x, dims=1)'  # == true

@btime foo2.($cols);    # 8.750 μs (3 allocations: 78.30 KiB)
@btime foo.($cols);     # 3.592 ms (3 allocations: 78.30 KiB)
@btime foo_loops($x);   # 4.298 μs (1 allocation: 7.94 KiB)
@btime sum($x, dims=1); # 3.432 μs (4 allocations: 8.08 KiB)

So it seems like it might not be eachcol that’s not optimized, but a missing sparse method for sum. I’ll probably just add something like this to my local Bandaids.jl package.

Base.sum(x::SubArray{T, 1, <:AbstractSparseArray}) where T = sum(nonzeros(x))


FWIW, this seems to be slightly faster:

julia> @btime foo_loops($x);
  6.331 μs (1 allocation: 7.94 KiB)

julia> function foo(x)
           s = Vector{eltype(x)}(undef, size(x,2))
           for (icol, col) in pairs(eachcol(x))
               s[icol] = sum(nonzeros(col))
           return s
foo (generic function with 1 method)

julia> @btime foo($x);
  5.203 μs (1 allocation: 7.94 KiB)

1 Like