How to compute a conditional sum of a vector efficiently

I have three vectors x, a, b. I need a new vector Y which has as its k^{th} element the following sum: Y^{}_k = \sum_{i=1}^N x_i \ \mathbb{1}(a_i=a_k, b_i=b_k) i.e., the sum of x from all rows with the same values of a, b as the current row.

Is there a more efficient way to implement this than the code below? Would a for loop
be better?

condition = (a.==a').*(b.==b')
y = (condition.*x', dims=2)

I thought your code would be easy to optimize because the first line allocates a matrix of size n^2. My loop implementation is indeed faster in some cases, but slower in others, like the one below. Wonder what I’m doing wrong…

julia> using BenchmarkTools

julia> N, V = 1000, 3;

julia> a, b = rand(1:V, N), rand(1:V, N);

julia> x = rand(N);

julia> function f(x, a, b)
           condition = (a .== a') .* (b .== b')
           y = sum(condition .* x', dims=2)
f (generic function with 1 method)

julia> function g(x, a, b)
           y = zero(x)
           for k in eachindex(x, a, b)
               aₖ, bₖ = a[k], b[k]
               for i in eachindex(x, a, b)
                   if (a[i] == aₖ) && (b[i] == bₖ)
                       y[k] += x[i]
           return y
g (generic function with 1 method)

julia> @assert f(x, a, b) ≈ g(x, a, b)

julia> @btime f($x, $a, $b);
  2.371 ms (11 allocations: 7.76 MiB)

julia> @btime g($x, $a, $b);
  3.387 ms (1 allocation: 7.94 KiB)
1 Like

If you only have a small number V of possible values of a and b, then you can do much better by first summing into a matrix indexed by values of V, because then the time complexity is \Theta(N) (with space complexity \Theta(N+V^2)) instead of \Theta(N^2) as in the implementations above. Correspondingly, the following is about 1000x faster than either of the implementations above on my machine for this test data:

function h(x, a, b, V)
    s = zeros(eltype(x), V, V)
    for k in eachindex(x, a, b)
        s[a[k], b[k]] += x[k]
    return [s[a[k], b[k]] for k in eachindex(x, a, b)]

If you don’t know ahead of time what possible values there are for a and b (or if the number of possible values is large), you could use the same strategy but with a dictionary of tuples:

function h2(x, a, b)
    s = Dict{Tuple{eltype(a),eltype(b)},eltype(x)}()
    for k in eachindex(x, a, b)
        key = (a[k], b[k])
        s[key] = get(s, key, zero(eltype(x))) + x[k]
    return [s[a[k], b[k]] for k in eachindex(x, a, b)]

which also has \Theta(N) complexity, with a worse constant factor by about 6x on my machine. (Could be made a bit faster ala julia#12157.)


2 posts were split to a new topic: Speeding up a sum involving 3 matrices