If you only have a small number V of possible values of a and b, then you can do much better by first summing into a matrix indexed by values of V, because then the time complexity is \Theta(N) (with space complexity \Theta(N+V^2)) instead of \Theta(N^2) as in the implementations above. Correspondingly, the following is about 1000x faster than either of the implementations above on my machine for this test data:
function h(x, a, b, V)
s = zeros(eltype(x), V, V)
for k in eachindex(x, a, b)
s[a[k], b[k]] += x[k]
end
return [s[a[k], b[k]] for k in eachindex(x, a, b)]
end
If you don’t know ahead of time what possible values there are for a and b (or if the number of possible values is large), you could use the same strategy but with a dictionary of tuples:
function h2(x, a, b)
s = Dict{Tuple{eltype(a),eltype(b)},eltype(x)}()
for k in eachindex(x, a, b)
key = (a[k], b[k])
s[key] = get(s, key, zero(eltype(x))) + x[k]
end
return [s[a[k], b[k]] for k in eachindex(x, a, b)]
end
which also has \Theta(N) complexity, with a worse constant factor by about 6x on my machine. (Could be made a bit faster ala julia#12157.)