Grouping by Split, apply, combine

I have a big set of 4-tuples and would like to group them based on their third position. As an example:

using SplitApplyCombine
A = [(1,5,2,5), (,12,5,9,6), (7,5,2,6), (11,5,2,5), (8,5,9,6), (3,5,3,6), (7,5,9,6), (1,5,3,6)]
Third = [2,9,3 ] 

I used group(3, A) and group(Third, A) but both returns error. How to group them?

julia> group(t -> t[3], A)
3-element Dictionaries.Dictionary{Int64, Vector{NTuple{4, Int64}}}
 2 β”‚ [(1, 5, 2, 5), (7, 5, 2, 6), (11, 5, 2, 5)]
 9 β”‚ [(12, 5, 9, 6), (8, 5, 9, 6), (7, 5, 9, 6)]
 3 β”‚ [(3, 5, 3, 6), (1, 5, 3, 6)]
1 Like

Hi @pdeffebach Thank you! It’s great. Is there any way to impose extra condition via group??

Like filtering all element in each of the keys, if the first element in in certain list? In other words, if grouped = group(t -> t[3], A) now that we know grouped(2) has 3-element Vector{Tuple{Int64, Int64, Int64}} is it possible to further filter those whos second element is 11. Or in general, if associated with each keys we have want to filter those whose let’s say are in [11, 8, 3]. So it would return

2 β”‚ [(11, 5, 2, 5)]
 9 β”‚ [ (8, 5, 9, 6)]
 3 β”‚ [(3, 5, 3, 6)]

Read the documentation of group. The key is just to write a function that returns a unique value for each set of requirements. And group based on that function.

1 Like

@pdeffebach Yes, I’m going to reread it. I wish Julia’s documetation were a bit longer.

You could filter before or after the group.

There is the possibility of transforming the elements of the groups through a function.
A solution that comes close to what you are asking could be this:

group(t->t[3], t->t[1] βˆ‰ [11,8,3] ? () : t, A )

But if you really want a function that groups and filters at the same time, you have to build it.
One could be the following


function groupbynth(A,n, sieve)
    d=Dict{Int64, Vector{Tuple}}()
    foreach(tA-> tA[1] ∈ sieve ? push!(get!(()->Vector{Tuple}(), d, tA[n]),tA) : nothing, A)
    d
end

1 Like

Thanks a lot @rocco_sprmnt21 it was very very helpful!

if you need performance, keep in mind that

using SplitApplyCombine, BenchmarkTools, Dictionaries

A=[Tuple(rand(1:100, 4)) for _ in 1:10^5]

function groupbynth(A,n, sieve)
    d=Dictionary{Int64, Vector{Tuple}}()
    foreach(tA-> tA[1] ∈ sieve ? push!(get!(()->Vector{Tuple}(), d, tA[n]),tA) : nothing, A)
    d
end

julia> @btime groupbynth(A,3, [11, 8, 3])
  893.400 ΞΌs (3354 allocations: 202.81 KiB)
100-element Dictionary{Int64, Vector{Tuple}}
   4 β”‚ Tuple[(11, 35, 4, 28), (3, 93, 4, 98), (11, 82, 4, 97), (3, 75, 4, 76), (…
  38 β”‚ Tuple[(8, 63, 38, 95), (11, 46, 38, 23), (3, 3, 38, 1), (3, 70, 38, 80), …
  19 β”‚ Tuple[(8, 67, 19, 42), (11, 13, 19, 78), (11, 88, 19, 10), (3, 10, 19, 89…
  42 β”‚ Tuple[(11, 57, 42, 78), (11, 27, 42, 97), (11, 6, 42, 16), (8, 97, 42, 52…
  99 β”‚ Tuple[(11, 48, 99, 48), (8, 96, 99, 51), (11, 3, 99, 30), (8, 23, 99, 30)…

julia> @btime group(t->t[3], t->t[1] βˆ‰ [11,8,3] ? () : t, A )
  5.605 ms (100637 allocations: 16.33 MiB)
100-element Dictionary{Int64, Vector{Union{Tuple{}, NTuple{4, Int64}}}}
  4 β”‚ Union{Tuple{}, NTuple{4, Int64}}[(11, 35, 4, 28), (), (), (), (), (), (), …
 96 β”‚ Union{Tuple{}, NTuple{4, Int64}}[(), (), (), (), (), (), (), (), (), ()  ……
 95 β”‚ Union{Tuple{}, NTuple{4, Int64}}[(), (), (), (), (), (), (), (), (), ()  ……
 63 β”‚ Union{Tuple{}, NTuple{4, Int64}}[(), (), (), (), (), (), (), (), (), ()  ……


1 Like

you could also try the groupreduce function In the following way( I have no way now to verify the correctness of the expressions, I just go by heart)

groupreduce(t->t[3],identity, (t1,t2)->t2[1] ∈ [3,8,11] ? push!(t1,t2) : t1, A; init=Tuple[])

@Optimization, please check annotated code below, where the input was sligthly modified to try removing some ambiguity in the problem statement.

using SplitApplyCombine

A = [(1,5,2,5), (11,5,9,6), (7,5,2,6), (11,5,2,5), (8,5,9,6), (3,5,3,6), (7,5,9,6), (1,5,3,6)]
D = group(t -> t[3], A)  # groups tuples by their 3rd element
b = (11, 8, 3)    # tuple for filtering the last 3 groups computed, one value per group

[k => filter(x -> x[1] == v, d) for (v, (k,d)) in zip(b, pairs(D))]  # matches first element of each tuple

3-element Vector{Pair{Int64, Vector{NTuple{4, Int64}}}}:
 2 => [(11, 5, 2, 5)]
 9 => [(8, 5, 9, 6)]
 3 => [(3, 5, 3, 6)]
1 Like