Is there something like Python’s itertools.groupby
, which is like Julia’s Transducers.PartitionBy
but yields (partition_key, partition_entries) pairs instead of only partition entries?
cc @tkf
Is there something like Python’s itertools.groupby
, which is like Julia’s Transducers.PartitionBy
but yields (partition_key, partition_entries) pairs instead of only partition entries?
cc @tkf
Hmm… good point. I don’t remember why I didn’t pass along the key to the downstream transdcuers.
It is actually possible to write this, though:
julia> using Transducers, MicroCollections
julia> [1, 3, 2, 4, 3, 5] |>
Map(x -> (isodd(x), x)) |>
ReducePartitionBy(
first,
TeeRF(Map(first)'(right), Map(SingletonVector ∘ last)'(Completing(append!!))),
) |>
collect
3-element Vector{Tuple{Bool, Vector{Int64}}}:
(1, [1, 3])
(0, [2, 4])
(1, [3, 5])
(which is, BTW, parallelizable while PartitionBy is not)
OK, but arguably this is rather hairy to write.
Maybe it’d be better to wrap it in something like
reduced_partition_and_key(f, rf = Map(SingletonVector)'(Completing(append!!))) =
Map(x -> (f(x), x)) |>
ReducePartitionBy(
first,
TeeRF(Map(first)'(right), Map(last)'(rf)),
)
so that
julia> [1, 3, 2, 4, 3, 5] |> reduced_partition_and_key(isodd) |> collect
3-element Vector{Tuple{Bool, Vector{Int64}}}:
(1, [1, 3])
(0, [2, 4])
(1, [3, 5])
julia> [1, 3, 2, 4, 3, 5] |> reduced_partition_and_key(isodd, +) |> collect
3-element Vector{Tuple{Bool, Int64}}:
(1, 4)
(0, 6)
(1, 8)
(The second example fuses in-partition reduction and avoids allocation of the inner vectors.)
I don’t see why Unique()
fails here. Replacing Unique() |> collect
with collect |> unique
works fine.
using Transducers, MicroCollections
reduced_partition_and_key(f, rf = Map(SingletonVector)'(Completing(append!!))) =
Map(x -> (f(x), x)) |>
ReducePartitionBy(
first,
TeeRF(Map(first)'(right), Map(last)'(rf)),
)
charstrings = string.(collect("a123bc34d8ef34"))
charstrings |>
reduced_partition_and_key(x->isnothing(tryparse(Int, x)), *) |>
Filter(==(0) ∘ first) |>
Map(x->parse(Int, x[2])) |>
Unique() |>
collect
ERROR: LoadError: MethodError: no method matching unwrap(::Transducers.Reduction{Unique{typeof(identity)},Transducers.Reduction{Map{Type{BangBang.NoBang.SingletonVector}},Transducers.BottomRF{Transducers.AdHocRF{typeof(BangBang.collector),typeof(identity),typeof(append!!),typeof(identity),Nothing}}}}, ::Tuple{Bool,String})
Unfortunately, stateful transdcuers like Unique
cannot be used after parallelizable transdcuer like ReducePartitonBy
. It’s kind of a cost of parallelizability. There can be a better design to allow this but it’s a bit tricky to do ATM.
(Though the unwrap
method error is actually a bug. Thanks for sharing the code!)
Meanwhile, I think the easiest approach might be to just cook up your partitionby
using FGenerators:
julia> using FGenerators
julia> @fgenerator function partitionby(f, xs)
buffer = eltype(xs)[]
key = f(first(xs))
for x in xs
y = f(x)
if !isequal(y, key)
@yield key => buffer
empty!(buffer)
key = y
end
push!(buffer, x)
end
end
partitionby (generic function with 1 method)
julia> partitionby(x->isnothing(tryparse(Int, x)), charstrings) |>
Map(((k, v),) -> (k, prod(v))) |>
Filter(==(0) ∘ first) |>
Map(x->parse(Int, x[2])) |>
Unique() |>
collect
3-element Vector{Int64}:
123
34
8
Note: xs -> partitionby(f, xs)
is not a transducer so pre-processing of xs
cannot be done with transdcuer.
It just occurred to me that you’d need isempty(buffer) || @yield key => buffer
at the end.