How to speed up rowsum function?

There is a rowsum function in R, it’s very helpful and fast when constructing some likelihood function, rowsum can apply a function to a group subsetted from a matrix then concatenate these resulted vectors to a new matrix. So I write a similar funtion myself in Julia, however, the speed is no ideal, julia version costs 500ms, R version costs 5ms. Is there any way to speed up julia version code ?

# Julia version
using Random
using BenchmarkTools
seed = Random.seed!(2022 - 20 - 13)

function rowsum(m::AbstractMatrix{<:Number}, g::Vector{Int})
    rst = Matrix{eltype(m)}(undef, maximum(g), size(m, 2))
    @inbounds @simd for group in 1:maximum(g)
        rst[group, :] = sum(view(m, g .== group, :); dims = 1)
    end 
    return rst
end

N = 100000
a = rand(seed, N, 4);
g = repeat(1:Int(N/5), inner = 5);

rowsum(a, g);

@btime rowsum($a, $g);
@code_warntype rowsum(a, g) # make sure type stability
# R version
library(microbenchmark)
set.seed(2022 - 20 - 13)

N <- 100000
a <- matrix(rnorm(N * 4), N, 4)

g <- rep(1:(N/5), each = 5)

b <- rowsum(a, g)

t = microbenchmark(
    "rowsum" = {
        b = rowsum(a, g)
    }, 
    times = 1000,
    unit = "ms"
)

t

Your Julia implementation is not efficient.
Your grouping in

view(m, g .== group, :)

has to check the complete vector g again and again for every value in group.
Per definition of the R function the same can be done on a single run through g, e.g. by storing the row index for each factor found in g in a dictionary.

In other words, your implementation compares each group value maximum(g) * 100000 times, means

julia> length(1:maximum(g))*length(g)
2000000000

This times comparisons vs.

julia> length(g)
100000

this times running a single run through g.

I am not sure if I am clear enough. In the mean time I will try some more efficient implementation.

3 Likes

Here’s one way to improve this, but there might be better ways:

julia> function rowsum2(mat::AbstractMatrix{<:Number}, groups::AbstractVector{<:Integer})
           size(mat, 1) == length(groups) || error("length of group vector must match matrix")
           rst = fill!(similar(mat, maximum(groups), size(mat, 2)), 0)
           for (i, g) in pairs(groups)
             @views rst[g, :] .+= mat[i, :]
           end
           return rst
       end;

julia> r = @btime rowsum($a, $g);
  min 1.125 s, mean 1.127 s (160002 allocations, 331.12 MiB)

julia> r ≈ @btime rowsum2($a, $g)
  min 984.500 μs, mean 1.061 ms (2 allocations, 625.05 KiB)
true
2 Likes

Here is my implementation, which isn’t faster but perhaps for OP better to read and which matches better to my wall of text from above:

function rowsum3(m::AbstractMatrix{<:Number}, g::Vector{Int})
	rst = Matrix{eltype(m)}(undef, maximum(g), size(m, 2))
	idx = [ Int[] for i in 1:maximum(g) ]
	for i in 1:length(g)
       push!(idx[g[i]],i)
    end
    @inbounds for (i,rows) in enumerate(idx)
        rst[i, :] = sum(view(m, rows, :); dims = 1)
    end 
	return rst
end
julia> @btime rowsum($a, $g);
  965.716 ms (160002 allocations: 331.12 MiB)

julia> @btime rowsum3($a, $g);
  7.278 ms (120004 allocations: 8.09 MiB)

(EDIT: gave it the name rowsum3)

You’re right. It’s really a big problem

It’s much faster than R version, great solution!

A slight modification to the @mcabbott solution that makes it a little faster (running the matrices by column), but perhaps loses too much in generality.
I do not know if the use cases where you have to apply it do not fall within the scheme that I adopted (groups of fixed length, in essence)

julia> function rowsum3(mat::AbstractMatrix{<:Number}, groups::AbstractVector{<:Integer})
                  size(mat, 1) == length(groups) || error("length of group vector must match matrix")
                  rst = fill!(similar(mat, maximum(groups), size(mat, 2)), 0)    
                   gr = maximum(groups)*size(mat, 2)
                   sgr = Int(length(groups)/maximum(groups))
                   for g in 1:gr
                       for i in (g-1)*sgr+1:g*sgr
                           @views rst[g] += mat[i]
                       end
                   end
                  return rst
              end;

julia> @btime rowsum3($a, $g);
  552.300 μs (2 allocations: 625.05 KiB)

julia> rowsum2(a, g) == rowsum3(a, g) 
true