Cummean, cumall, and cumany

Currently, these functions exist in Base:

  • cumsum
  • cumprod
  • cummin (using accumulate)
  • cummax (using accumulate)

I’m just checking out dplyr’s functions and they have

  • cummean
  • cumall (logical bit vector)
  • cumany (logical bit vector)

Does it make sense to add these to Base? Or perhaps all these functions should be moved to Statistics?

Well, we have accumulate, in terms of which cumall is just accumulate(&, A) and cumany is accumulate(|, A), so it doesn’t seem worthwhile adding specific functions for these.

cummean(A) = cumsum(A) ./ (1:length(A)) is slightly more complicated, at least if you want to eliminate the allocation of a temporary array. If it’s a common operation it might make sense in Statistics. However, I can’t seem to find any record of anyone ever asking for this function before, so that doesn’t seem to indicate a big demand?

Ironically, I am currently porting some code from Matlab that contains the equivalent of cummean as described here. So, :+1: from me!

Yeah I couldn’t find any previous requests from the julia repo either.

Looks like Bayesian people want it?

https://juliahub.com/ui/CodeSearch?q=cummean%20&u=define&t=all

Just to do some publicity to a couple of really nice packages, you could do the cumulative mean with OnlineStats + Transducers:

julia> using OnlineStats, Transducers

julia> collect(Transducer(Mean()), 1:4)

(It’s one of the examples in the docs.)

Did not know this. Thanks for sharing.

Is it fast? I thought the canonical way in Julia is accumulate

Transducer(::OnlineStat) is not really for performance. It’s mainly for exposing the super rich set of functionality that exists in OnlineStats.jl to Transducers.jl-based API. OnlineStats are really “just” reducing functions obscured by the implementation detail that uses in-place mutation. So, it’d be such a pity if two libraries cannot talk to each other.