I have three vectors x, a, b. I need a new vector Y which has as its k^{th} element the following sum: Y^{}_k = \sum_{i=1}^N x_i \ \mathbb{1}(a_i=a_k, b_i=b_k) i.e., the sum of x from all rows with the same values of a, b as the current row.

Is there a more efficient way to implement this than the code below? Would a for loop
be better?

condition = (a.==a').*(b.==b')
y = (condition.*x', dims=2)

I thought your code would be easy to optimize because the first line allocates a matrix of size n^2. My loop implementation is indeed faster in some cases, but slower in others, like the one below. Wonder what I’m doing wrong…

julia> using BenchmarkTools
julia> N, V = 1000, 3;
julia> a, b = rand(1:V, N), rand(1:V, N);
julia> x = rand(N);
julia> function f(x, a, b)
condition = (a .== a') .* (b .== b')
y = sum(condition .* x', dims=2)
end
f (generic function with 1 method)
julia> function g(x, a, b)
y = zero(x)
for k in eachindex(x, a, b)
aₖ, bₖ = a[k], b[k]
for i in eachindex(x, a, b)
if (a[i] == aₖ) && (b[i] == bₖ)
y[k] += x[i]
end
end
end
return y
end
g (generic function with 1 method)
julia> @assert f(x, a, b) ≈ g(x, a, b)
julia> @btime f($x, $a, $b);
2.371 ms (11 allocations: 7.76 MiB)
julia> @btime g($x, $a, $b);
3.387 ms (1 allocation: 7.94 KiB)

If you only have a small number V of possible values of a and b, then you can do much better by first summing into a matrix indexed by values of V, because then the time complexity is \Theta(N) (with space complexity \Theta(N+V^2)) instead of \Theta(N^2) as in the implementations above. Correspondingly, the following is about 1000x faster than either of the implementations above on my machine for this test data:

function h(x, a, b, V)
s = zeros(eltype(x), V, V)
for k in eachindex(x, a, b)
s[a[k], b[k]] += x[k]
end
return [s[a[k], b[k]] for k in eachindex(x, a, b)]
end

If you don’t know ahead of time what possible values there are for a and b (or if the number of possible values is large), you could use the same strategy but with a dictionary of tuples:

function h2(x, a, b)
s = Dict{Tuple{eltype(a),eltype(b)},eltype(x)}()
for k in eachindex(x, a, b)
key = (a[k], b[k])
s[key] = get(s, key, zero(eltype(x))) + x[k]
end
return [s[a[k], b[k]] for k in eachindex(x, a, b)]
end

which also has \Theta(N) complexity, with a worse constant factor by about 6x on my machine. (Could be made a bit faster ala julia#12157.)