# How to speed up rowsum function?

There is a `rowsum` function in R, it’s very helpful and fast when constructing some likelihood function, `rowsum` can apply a function to a group subsetted from a matrix then concatenate these resulted vectors to a new matrix. So I write a similar funtion myself in Julia, however, the speed is no ideal, julia version costs 500ms, R version costs 5ms. Is there any way to speed up julia version code ?

``````# Julia version
using Random
using BenchmarkTools
seed = Random.seed!(2022 - 20 - 13)

function rowsum(m::AbstractMatrix{<:Number}, g::Vector{Int})
rst = Matrix{eltype(m)}(undef, maximum(g), size(m, 2))
@inbounds @simd for group in 1:maximum(g)
rst[group, :] = sum(view(m, g .== group, :); dims = 1)
end
return rst
end

N = 100000
a = rand(seed, N, 4);
g = repeat(1:Int(N/5), inner = 5);

rowsum(a, g);

@btime rowsum(\$a, \$g);
@code_warntype rowsum(a, g) # make sure type stability
``````
``````# R version
library(microbenchmark)
set.seed(2022 - 20 - 13)

N <- 100000
a <- matrix(rnorm(N * 4), N, 4)

g <- rep(1:(N/5), each = 5)

b <- rowsum(a, g)

t = microbenchmark(
"rowsum" = {
b = rowsum(a, g)
},
times = 1000,
unit = "ms"
)

t
``````

Your Julia implementation is not efficient.

``````view(m, g .== group, :)
``````

has to check the complete vector `g` again and again for every value in `group`.
Per definition of the R function the same can be done on a single run through `g`, e.g. by storing the row index for each factor found in `g` in a dictionary.

In other words, your implementation compares each `group` value `maximum(g) * 100000` times, means

``````julia> length(1:maximum(g))*length(g)
2000000000
``````

This times comparisons vs.

``````julia> length(g)
100000
``````

this times running a single run through g.

I am not sure if I am clear enough. In the mean time I will try some more efficient implementation.

3 Likes

Here’s one way to improve this, but there might be better ways:

``````julia> function rowsum2(mat::AbstractMatrix{<:Number}, groups::AbstractVector{<:Integer})
size(mat, 1) == length(groups) || error("length of group vector must match matrix")
rst = fill!(similar(mat, maximum(groups), size(mat, 2)), 0)
for (i, g) in pairs(groups)
@views rst[g, :] .+= mat[i, :]
end
return rst
end;

julia> r = @btime rowsum(\$a, \$g);
min 1.125 s, mean 1.127 s (160002 allocations, 331.12 MiB)

julia> r ≈ @btime rowsum2(\$a, \$g)
min 984.500 μs, mean 1.061 ms (2 allocations, 625.05 KiB)
true
``````
2 Likes

Here is my implementation, which isn’t faster but perhaps for OP better to read and which matches better to my wall of text from above:

``````function rowsum3(m::AbstractMatrix{<:Number}, g::Vector{Int})
rst = Matrix{eltype(m)}(undef, maximum(g), size(m, 2))
idx = [ Int[] for i in 1:maximum(g) ]
for i in 1:length(g)
push!(idx[g[i]],i)
end
@inbounds for (i,rows) in enumerate(idx)
rst[i, :] = sum(view(m, rows, :); dims = 1)
end
return rst
end
``````
``````julia> @btime rowsum(\$a, \$g);
965.716 ms (160002 allocations: 331.12 MiB)

julia> @btime rowsum3(\$a, \$g);
7.278 ms (120004 allocations: 8.09 MiB)
``````

(EDIT: gave it the name rowsum3)

You’re right. It’s really a big problem

It’s much faster than R version, great solution!

A slight modification to the @mcabbott solution that makes it a little faster (running the matrices by column), but perhaps loses too much in generality.
I do not know if the use cases where you have to apply it do not fall within the scheme that I adopted (groups of fixed length, in essence)

``````julia> function rowsum3(mat::AbstractMatrix{<:Number}, groups::AbstractVector{<:Integer})
size(mat, 1) == length(groups) || error("length of group vector must match matrix")
rst = fill!(similar(mat, maximum(groups), size(mat, 2)), 0)
gr = maximum(groups)*size(mat, 2)
sgr = Int(length(groups)/maximum(groups))
for g in 1:gr
for i in (g-1)*sgr+1:g*sgr
@views rst[g] += mat[i]
end
end
return rst
end;

julia> @btime rowsum3(\$a, \$g);
552.300 μs (2 allocations: 625.05 KiB)

julia> rowsum2(a, g) == rowsum3(a, g)
true
``````