Cumulative count function?

Is there a Julia function that can do a cumulative count of previous occurrences of each value an array?

That is, something that would behave like this (using a mixed array as an example):

cumcount([0, 5, 5, "foo", 5, 7, "bar", "foo"])
8-element Array{Int64,1}:
 1
 1
 2
 1
 3
 1
 1
 2

Before I get on and create something inefficient and bloated, I just want to check for anything elegant and beautiful that I’ve been unable to find so far.

Depends on your definition of beautiful…

function f(A)
    d = Dict()
    function g(a)
        d[a] = get(f, a, 0) + 1
    end
    g.(A)
end
5 Likes

I think you meant get(d, a, 0).

And in any case, it is too beautiful, and also quite clear, to the extent that even people relatively unfamiliar with Julia may just understand the code.

So we clearly need to make it uglier, eg

f(A) = (d = Dict(); (a -> d[a] = get(d, a, 0) + 1).(A))
3 Likes

I’m not sure broadcasting guarantees left-associativity. So you might
want [g(a) for a in A] instead of g.(A).

1 Like

“Ugly” competition? - I can play this game:

[findall(isequal(A[x]), A[1:x]) |> length for x in eachindex(A)]

Guaranteed left-associative:

julia> cumcount(xs) = foldl(xs, init = (Dict(), [])) do (d,l), x
           d[x] = get(d,x,0) + 1
           push!(l,d[x])
           (d,l)
       end[2]
cumcount (generic function with 1 method)

julia> cumcount([0, 5, 5, "foo", 5, 7, "bar", "foo"])
8-element Array{Any,1}:
 1
 1
 2
 1
 3
 1
 1
 2


1 Like

Isn’t that O(n^2)?

By the way, the (unwritten) rules of the competition are:

  1. it should do something well,
  2. but have a subtle potential bug (as pointed out by @non-Jedi — if broadcasting does not ensure sequential mapping, a lot of code will break, hopefully non-deterministically),
  3. have enough punctuation to complete an e e cummings poem to fully formed sentences.

Being bug-free, your code is disqualified by (2) :wink:

2 Likes

This is very confusing, I have no idea how the f got there. I tested different versions (including yours, but I deemed it too beautiful for the public eye) and benchmarked them and everything, yet somehow I got it wrong when copying it. Is this Muphry’s law at work?

Not sure if it’s ugly or beautiful but transducers are good solution for this kind of tasks

using Transducers
using BangBang

xf = ScanEmit(Dict{Union{},Union{}}()) do d, x
    c = get(d, x, 0) + 1
    c, setindex!!(d, c, x)
end

collect(xf, [0, 5, 5, "foo", 5, 7, "bar", "foo"])
2 Likes