Iterating n items at a time and filling

In Slack, @mastrof wanted the equivalent of the following Python 3 code which returns n items at a time and pads with fillvalue if there are less than n items remaining

def f(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue) 

We currently do not have zip_longest in IterTools.jl although there is a PR:

Additionally, the iteration protocol is distinct in Julia from Python.

1 Like

After @jakobnissen recommended Iterators.partition, I proposed the following solution:

function ntaker_gen(iterable, n, fillvalue=nothing)
   padded_iterable = Iterators.flatten( (iterable, Iterators.repeated(fillvalue,n-1)) )
   n_partition = Iterators.partition( padded_iterable, n )
   (x for x in n_partition if length(x) == n)
end

ntaker(args...) = collect( ntaker_gen(args...) )

This results in the following:

julia> ntaker(1:6, 3, 0)
2-element Array{Array{Any,1},1}:
 [1, 2, 3]
 [4, 5, 6]

julia> ntaker(1:7, 3, 0)
3-element Array{Array{Any,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 0, 0]

julia> ntaker(1:8, 3, 0)
3-element Array{Array{Any,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8, 0]

julia> ntaker(1:9, 3, 0)
3-element Array{Array{Any,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8, 9]

Is there a better solution?

Here is a one line version:

julia> function ntaker_one_liner(iterable, n, fillvalue=nothing)
           [ [x; fill(fillvalue, n-length(x)) ] for x in Iterators.partition( iterable, n ) ]
       end
ntaker_one_liner (generic function with 2 methods)

julia> ntaker_one_liner(1:6, 3, 0)
2-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [4, 5, 6]

julia> ntaker_one_liner(1:7, 3, 0)
3-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 0, 0]

julia> ntaker_one_liner(1:8, 3, 0)
3-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8, 0]

julia> ntaker_one_liner(1:9, 3, 0)
3-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8, 9]

using Base.Iterators

function foo(itr, n, fillvalue)
    ntake = ceil(Int, length(itr)/n)
    extended_itr = flatten( ( itr, repeated(fillvalue) ) )
    take(partition(extended_itr, n), ntake)
end

However, note that for some reason this produces a type instability. Maybe the compiler is coughing up on all those nested iterators?

julia> collect(foo(1:7, 3, 0))
3-element Array{Array{Any,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 0, 0]

julia> collect(foo(1:8, 3, 0))
3-element Array{Array{Any,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8, 0]

julia> collect(foo(1:9, 3, 0))
3-element Array{Array{Any,1},1}:
 [1, 2, 3]
 [4, 5, 6]
 [7, 8, 9]

julia> collect(foo(1:10, 3, 0))
4-element Array{Array{Any,1},1}:
 [1, 2, 3] 
 [4, 5, 6] 
 [7, 8, 9] 
 [10, 0, 0]

Thanks for this version. I understand how to use take much better now.

One assumption this makes is that length(itr) exists but length is only an optional part of the iteration interface. For example filtered iterators may not have a known length a priori:

julia> filtered_iter = (x for x in 1:7 if mod(x,2))
Base.Generator{Base.Iterators.Filter{var"#5#6",UnitRange{Int64}},typeof(identity)}(identity, Base.Iterators.Filter{var"#5#6",UnitRange{Int64}}(var"#5#6"(), 1:7))

julia> length(filtered_iter)
ERROR: MethodError: no method matching length(::Base.Iterators.Filter{var"#5#6",UnitRange{Int64}})
Closest candidates are:
  length(::Cmd) at process.jl:639
  length(::Base.Iterators.Flatten{Tuple{}}) at iterators.jl:1061
  length(::BitSet) at bitset.jl:365
  ...
Stacktrace:
 [1] length(::Base.Generator{Base.Iterators.Filter{var"#5#6",UnitRange{Int64}},typeof(identity)}) at ./generator.jl:50
 [2] top-level scope at REPL[5]:1

julia> function foo(itr, n, fillvalue)
           ntake = ceil(Int, length(itr)/n)
           extended_itr = flatten( ( itr, repeated(fillvalue) ) )
           take(partition(extended_itr, n), ntake)
       end
foo (generic function with 1 method)

julia> foo(filtered_iter, 3, 0)
ERROR: MethodError: no method matching length(::Base.Iterators.Filter{var"#5#6",UnitRange{Int64}})
Closest candidates are:
  length(::Cmd) at process.jl:639
  length(::Base.Iterators.Flatten{Tuple{}}) at iterators.jl:1061
  length(::BitSet) at bitset.jl:365
  ...
Stacktrace:
 [1] length(::Base.Generator{Base.Iterators.Filter{var"#5#6",UnitRange{Int64}},typeof(identity)}) at ./generator.jl:50
 [2] foo(::Base.Generator{Base.Iterators.Filter{var"#5#6",UnitRange{Int64}},typeof(identity)}, ::Int64, ::Int64) at ./REPL[7]:2
 [3] top-level scope at REPL[8]:1
1 Like

Thank you @mkitti for moving this to Discourse.
I originally asked on Slack thinking that the Python code could be trivially translated to Julia and wouldn’t be worth a topic here. Looks like I was very wrong :slightly_smiling_face:

1 Like