Split iteration to head + tail

Sometimes it is necessary to handle the first element of an iterable separately, then work with the rest. I am wondering how to code these things cleanly.

To make things concrete, consider this example (calling iterate directly):

"""
Return the run lengths of `==` elements in the iterable itr.
"""
function runlengths(itr)
    lengths = Int[]
    y = iterate(itr)
    y ≡ nothing && return lengths
    lastelt, state = y
    runcount = 1
    while true
        y = iterate(itr, state)
        y ≡ nothing && break
        elt, state = y
        if elt == lastelt
            runcount += 1
        else
            push!(lengths, runcount)
            lastelt = elt
            runcount = 1
        end
    end
    push!(lengths, runcount)
    lengths
end

Example:

julia> runlengths([1,1,1,2,2,3,4,4,4])
4-element Array{Int64,1}:
 3
 2
 1
 3

Is there a way to code this in a cleaner way?

1 Like

Happy to have a run here :smiley:

That’s pretty much why I asked for Skipping parts of a for loop in the first iteration

With that

       """
       Return the run lengths of `==` elements in the iterable itr.
       """
       function runlengths(itr)
           lengths = Int[]
           local runcount = 1
           @unroll1 for elt in itr
               if $first
                   lastelt = elt
               else 
                   if elt == lastelt
                       runcount += 1
                   else
                       push!(lengths, runcount)
                       lastelt = elt
                       runcount = 1
                   end
               end
           end
           push!(lengths, runcount)
           lengths
       end

The macro there is not well tested yet though, just had to esc some more expression to make the example run, see
https://gist.github.com/mschauer/9265bd5b70c9abf1391d4ef541d53eca

1 Like

Use Iterators.peel:

function runlengths(itr)
    lengths = Int[]
    runcount = 1
    isempty(itr) && return lengths
    lastelt, rest = Iterators.peel(itr)
    for elt in rest
        if elt == lastelt
            runcount += 1
        else
            push!(lengths, runcount)
            lastelt = elt
            runcount = 1
        end
    end
    return push!(lengths, runcount)
end
2 Likes

Unfortunately, isempty(itr) calls iterate(itr) twice.

I guess sometimes using iterate directly is the cleanest solution and I should not have any problems with it.

I’d probably go with something similar to your original solution. It’s verbose, but I think it’s clean and easy to read.

Or, for brevity, you can do:

function runlengths(itr)
    len = Int[]
    r = foldl((v,x) -> (v[2] ≠ x[2] && push!(len, v[1]); x), enumerate(itr))
    diff(vcat(0, len, r[1]))
end
1 Like