# PartitionBy, retaining key

Is there something like Python’s `itertools.groupby`, which is like Julia’s `Transducers.PartitionBy` but yields (partition_key, partition_entries) pairs instead of only partition entries?

cc @tkf

Hmm… good point. I don’t remember why I didn’t pass along the key to the downstream transdcuers.

It is actually possible to write this, though:

``````julia> using Transducers, MicroCollections

julia> [1, 3, 2, 4, 3, 5] |>
Map(x -> (isodd(x), x)) |>
ReducePartitionBy(
first,
TeeRF(Map(first)'(right), Map(SingletonVector ∘ last)'(Completing(append!!))),
) |>
collect
3-element Vector{Tuple{Bool, Vector{Int64}}}:
(1, [1, 3])
(0, [2, 4])
(1, [3, 5])
``````

(which is, BTW, parallelizable while PartitionBy is not)

OK, but arguably this is rather hairy to write.

Maybe it’d be better to wrap it in something like

``````reduced_partition_and_key(f, rf = Map(SingletonVector)'(Completing(append!!))) =
Map(x -> (f(x), x)) |>
ReducePartitionBy(
first,
TeeRF(Map(first)'(right), Map(last)'(rf)),
)
``````

so that

``````julia> [1, 3, 2, 4, 3, 5] |> reduced_partition_and_key(isodd) |> collect
3-element Vector{Tuple{Bool, Vector{Int64}}}:
(1, [1, 3])
(0, [2, 4])
(1, [3, 5])

julia> [1, 3, 2, 4, 3, 5] |> reduced_partition_and_key(isodd, +) |> collect
3-element Vector{Tuple{Bool, Int64}}:
(1, 4)
(0, 6)
(1, 8)
``````

(The second example fuses in-partition reduction and avoids allocation of the inner vectors.)

I don’t see why `Unique()` fails here. Replacing `Unique() |> collect` with `collect |> unique` works fine.

``````using Transducers, MicroCollections

reduced_partition_and_key(f, rf = Map(SingletonVector)'(Completing(append!!))) =
Map(x -> (f(x), x)) |>
ReducePartitionBy(
first,
TeeRF(Map(first)'(right), Map(last)'(rf)),
)

charstrings = string.(collect("a123bc34d8ef34"))

charstrings |>
reduced_partition_and_key(x->isnothing(tryparse(Int, x)), *) |>
Filter(==(0) ∘ first) |>
Map(x->parse(Int, x[2])) |>
Unique() |>
collect

ERROR: LoadError: MethodError: no method matching unwrap(::Transducers.Reduction{Unique{typeof(identity)},Transducers.Reduction{Map{Type{BangBang.NoBang.SingletonVector}},Transducers.BottomRF{Transducers.AdHocRF{typeof(BangBang.collector),typeof(identity),typeof(append!!),typeof(identity),Nothing}}}}, ::Tuple{Bool,String})
``````

Unfortunately, stateful transdcuers like `Unique` cannot be used after parallelizable transdcuer like `ReducePartitonBy`. It’s kind of a cost of parallelizability. There can be a better design to allow this but it’s a bit tricky to do ATM.

(Though the `unwrap` method error is actually a bug. Thanks for sharing the code!)

Meanwhile, I think the easiest approach might be to just cook up your `partitionby` using FGenerators:

``````julia> using FGenerators

julia> @fgenerator function partitionby(f, xs)
buffer = eltype(xs)[]
key = f(first(xs))
for x in xs
y = f(x)
if !isequal(y, key)
@yield key => buffer
empty!(buffer)
key = y
end
push!(buffer, x)
end
end
partitionby (generic function with 1 method)

julia> partitionby(x->isnothing(tryparse(Int, x)), charstrings) |>
Map(((k, v),) -> (k, prod(v))) |>
Filter(==(0) ∘ first) |>
Map(x->parse(Int, x[2])) |>
Unique() |>
collect
3-element Vector{Int64}:
123
34
8
``````

Note: `xs -> partitionby(f, xs)` is not a transducer so pre-processing of `xs` cannot be done with transdcuer.

2 Likes

It just occurred to me that you’d need `isempty(buffer) || @yield key => buffer` at the end.