An efficient way to compute the average length of streaks of a given number

Suppose I have a large vector of integers ranging from 1 to 5. Now suppose I want to compute the average length of streaks of say, the number 5, where a single appearance of “5” before a different number appears, is counted as a streak of 1.

So, for example, for the vector [1,2,5,5,1,5,3,5,5,5], the average streak-length of the number 5 is (2 + 1 + 3) / 3 = 2

What is an efficient way to compute the average length of streaks of a given number in Julia?

I would just take the naive approach with a simple, straight for loop:

function count_streak(pred::Function, itr)
     n_streaks, cumlen, len = 0, 0, 0
     reset(n_streaks, cumlen, len) = (n_streaks+1, cumlen+len, 0)
     for i in itr
         if pred(i)
             len += 1
         else
             if len > 0
                 (n_streaks, cumlen, len) = reset(n_streaks, cumlen, len)
             end
         end
     end
     if len > 0
        (n_streaks, cumlen, len) = reset(n_streaks, cumlen, len)
     end
     cumlen / n_streaks
 end
3 Likes

If you want a much slower one-liner, this is one option:

mean(length(v) for v in filter(v -> v[1] == 5, collect(groupby(==(5), x))))
1 Like

I think you are looking for run length encoding, which is available in StatsBase

https://juliastats.org/StatsBase.jl/v0.18/misc.html#StatsBase.rle

2 Likes

Another one liner version of count_streak:

using IterTools

cnt_strk(p, v) = 
  sum(p.(v)) ./ (p(v[1])+sum(map(x -> <(x...), partition(p.(v), 2, 1))))

Test:

julia> cnt_strk(==(5),V)
1.248118414450577

julia> count_streak(==(5),V)
1.248118414450577

Or, the same idea but a little more code-golfy:

cnt_strk_back(p, v) = 
  ((s,t)-> s/(s-t))(sum(p.(v)), sum(map(all,partition(p.(v),2,1))))
1 Like