# What's julia's solution to tapply or accumarray?

I have one-column Index as the below:
`A = [1, 2, 2, 3, 3, 3];`

I have a matrix B as the below:
`B = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7; 6 7 8];`

I would want to calculate the average of the duplicate rows based on Index A. The outcome C would be:
`C = [1.0 2.0 3.0; 2.5 3.5 4.5; 5.0 6.0 7.0];`

In Matlab, I can do this:

``````[~, ~, subs] = unique(A, 'stable');
C = accumarray(subs, B(:), [], @mean);
``````

In R, we could use the function `tapply`

What would be the solution in Julia? Many thanks!

One way:

``````julia> A = [1, 2, 2, 3, 3, 3];

julia> B = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7; 6 7 8];

julia> using GroupSlices

julia> reduce(vcat, [mean(B[i, :], dims=1) for i in groupinds(A)])
3×3 Matrix{Float64}:
1.0  2.0  3.0
2.5  3.5  4.5
5.0  6.0  7.0
``````

There might be nicer ways to do this using DataFrames, instead of just matrices.

4 Likes

This gives an error in Julia.

Also, I don’t quite understand what you mean by

I would want to calculate the average of the duplicate rows based on Index A

Can you explain the desired operation a bit more?

3 Likes

Sorry for the error. I have fixed it.

Big thanks to Mcabbott who has offered a solution to this question perfectly!

I was trying to say that if some rows are duplicates, all values within these duplicate rows will be averaged within each of their respective column.

Many thanks again for the nice solution! In reality, my A index is a matrix composed of 3 columns:
`A = Data[:, [21, 25, 35] ];`

In this case, how do I specify the `for i in groupinds(A)` syntax?

I’m not certain I follow, but you can ask for unique rows like this:

``````julia> A = [1, 2, 2, 3, 3, 3];

julia> groupinds(A)
3-element Vector{Vector{Int64}}:

[2, 3]
[4, 5, 6]

julia> A2 = hcat(A, [10, 99, 99, 99, 4, 4])
6×2 Matrix{Int64}:
1  10
2  99
2  99
3  99
3   4
3   4

julia> unique(A2, dims=1)
4×2 Matrix{Int64}:
1  10
2  99
3  99
3   4

julia> groupinds(groupslices(A2, dims=1))
4-element Vector{Vector{Int64}}:

[2, 3]

[5, 6]
``````
3 Likes

That’s exactly what I am looking for! Many thanks …

The explanation was not very clear but it is basically: to average the rows of matrix `B` as per the list of indices in vector `A`.

Suggestion of a package-less alternative, found thanks to Gabriel Fauré’s Requiem:

``````A = [1, 2, 2, 3, 3, 3]
B = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7; 6 7 8]

C = vcat([mean(B[ai .∈ A,:], dims=1) for ai in unique(A)]...)

3×3 Matrix{Float64}:
1.0  2.0  3.0
2.5  3.5  4.5
5.0  6.0  7.0
``````
4 Likes

Yet another problem discussed here these days with a direct solution in StructArrays and SplitApplyCombine packages (:

``````using StructArrays
using SplitApplyCombine
using Statistics

# original arrays
A = [1, 2, 2, 3, 3, 3]
B = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7; 6 7 8]

# combine A and rows of B into a single array
AB = StructArray(; A, B=splitdims(B, 1))

# compute the desired per-group means
C = map(groupview(x -> x.A, AB)) do gr
mean(gr.B)
end

# C is a dictionary
# access values as e.g. C

# the same with a nicer piping syntax:
using DataPipes

@p AB |> groupview(_.A) |> map(mean(_.B))
``````
3 Likes

And with DataFrames:

``````using DataFrames
using Statistics

A = [1, 2, 2, 3, 3, 3]
B = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7; 6 7 8]

df = DataFrame(; A, B=collect(eachrow(B)))

combine(groupby(df, :A), :B => Ref∘mean)

3×2 DataFrame
Row │ A      B_Ref_mean
│ Int64  Array…
─────┼────────────────────────
1 │     1  [1.0, 2.0, 3.0]
2 │     2  [2.5, 3.5, 4.5]
3 │     3  [5.0, 6.0, 7.0]
``````
3 Likes

Very cool! What if A is a multiple column matrix? Instead of relying on unique(A), it will require unique rows, or `unique(A, dims=1)`. How would I write the program then?

Thanks.

If you are fine with using packages outside of Base, my solution (above) only requires a small modification:
replace `AB = StructArray(; A, B=splitdims(B, 1))` with `AB = StructArray(; A=splitdims(A, 1), B=splitdims(B, 1))`.

1 Like

this should work for both A vector and A matrix

``````using SplitApplyCombine, Statistics
mean.(group(first, last, zip(eachrow(A),eachrow(B))))
``````

@leon, your last question (What if A is a multiple column matrix?) requires pointing to the Matlab documentation for the `accumarray()` function and specifying exactly what you need, as all the available features and options seem to be massive.

Here is a quick & dirty attempt that worked for 2 out of the 3 Matlab examples tried from the link above. The failure occured for the `Int8` example. I tried to use ` reduce(+,...)` instead of ` sum()` to avoid auto-promotion but I got something else. Actually, I do not understand what Matlab is doing in that example.

``````function accumarray1(A::Matrix{Int64}, B::AbstractArray, fun::Function, T::Type)
N = size(A,2)
mx = maximum(A, dims=1)
C = zeros(T, mx, mx[2:N]...)
if fun == sum
Ci = vcat([reduce(+, B[i .∈ A[:,1],:], dims=1) for i in unique(A[:,1])]...)
else
Ci = vcat([fun(B[i .∈ A[:,1],:], dims=1) for i in unique(A[:,1])]...)
end
for ri in collect(eachrow(A))
C[ri...] = Ci[ri]
end
return C
end

# Matlab example-1: OK
B = 1:6    # data input
A = [1 1; 2 2; 3 2; 1 1; 2 2; 4 1]    # indices on first column, output to row N-d index
accumarray1(A, B, sum, Int64)

4×2 Matrix{Int64}:
5  0
0  7
0  3
6  0

# Matlab example-2: OK
using Statistics
B = [100.1, 101.2, 103.4, 102.8, 100.9, 101.5]
A = [1 1; 1 1; 2 2; 3 2; 2 2; 3 2]
accumarray1(A, B, var, Float64)

3×2 Matrix{Float64}:
0.605  0.0
0.0    3.125
0.0    0.845

# Matlab example-3: Not OK, but do not understand Matlab output with 4 different values?
B = Int8.(10:15)
A = [1 1 1; 1 1 1; 1 1 2; 1 1 2; 2 3 1; 2 3 2]
accumarray1(A, B, sum, Int8)

2×3×2 Array{Int8, 3}:
[:, :, 1] =
46  0   0
0  0  29

[:, :, 2] =
46  0   0
0  0  29
``````
1 Like

Many thanks all for the alternative solutions. It seems that my current approach is still one of the fastest way of doing this:

`B2 = reduce(vcat, [mean(B[i, :], dims=1) for i in groupinds(groupslices(A, dims=1))]);`

the following versions (for A vector and A matrix) seem quite competitive with respect to vcat (…).
At least for the small matrixes tested.

``````
function meangrpslm(A,B)
grp=Dict{Vector{Int64},Tuple{Array{Int64},Int64}}()
for (i, r) in enumerate(eachrow(A))
end
[first(g)/last(g) for g in values(grp)]
end

function meangrpslv(Arr,B)
grp=Dict{Int64,Tuple{Array{Int64},Int64}}()
for (i, r) in enumerate(Arr)