Slightly generalised `accumulate`

willtebbutt · August 16, 2020, 12:04pm

accumulate (a.k.a. scan, prefix etc) is great, but I’ve got a situation in which I need something very slightly different, and wonder whether anyone knows whether

it has a name
someone has already implemented it

Specifically, as I understand it accumulate eats a binary function f and an iterator xs and does something that’s roughly equivalent to (but presumably more clever than) the following:

function my_accumulate(f, xs, init)
    output = Vector{typeof(init)}(undef, length(xs))
    y = init
    for (n, x) in enumerate(xs)
        y = f(y, x)
        output[n] = y
    end
    return output
end

Now, I’ve got a function g that outputs a length 2 Tuple where I want to return the first element as part of output and pass the second on to the next iteration. Again, writing the for loop, something like this:

function not_quite_my_accumulate(g, xs, init)
    output = Vector{typeof(init)}(undef, length(xs))
    w = init
    for (n, x) in enumerate(xs)
        w, y = g(w, x)
        output[n] = y
    end
    return output
end

In general one could implement not_quite_my_accumulate using accumulate by storing both w and y in output / passing them both on to the next iteration and then playing around with the result, but in my application it tends to be the case that y is quite small while w is rather large. Consequently I can’t afford to store all the intermediate values of w.

I couldn’t see anything about this in the docs, and I’m not sure whether it’s a function that has a name, so any advice would be much appreciated!

xiaodai · August 16, 2020, 12:31pm

Eats a function. : two-argument would be less ambiguous

xiaodai · August 16, 2020, 1:21pm

willtebbutt:

function not_quite_my_accumulate(g, xs, init)
    output = Vector{typeof(init)}(undef, length(xs))
    w = init
    for (n, x) in enumerate(xs)
        w, y = g(w, x)
        output[n] = y
    end
    return output
end

Your code is quite easy to understand, I would use anything fancier. BUt sorry no help, i can’t think of a function. Others might though.

piever · August 16, 2020, 2:11pm

I guess there are a couple of tricks that can be useful. If w is type-stable, you could create a structure that encodes w and updates it as you iterate.

julia> mutable struct WithState{F, S} # somewhat similar to `Flux.Recur`
           f::F
           state::S
       end

julia> function (st::WithState)(x)
           w, y = st.f(st.state, x)
           st.state = w
           return y
       end

julia> map(WithState(g, init), xs) # not 100% sure that `map` guarantees to be run sequentially, though in practice I think it does

If you want to be efficient while allowing w to change type, you have to play around with foldl. For example, you could do

julia> res, _ = foldl(xs, init=(Union{}[], init)) do (vec, w), x
           ŵ, y = g(w, x)
           return push!!(vec, y), ŵ # `push!!` allows widening of the vector
       end

In general, playing around with the function you pass to foldl is a key idea for Transducers (or at least, that’s my understanding). Maybe the above could somehow be done as a transducer, but I don’t understand the details of the framework well enough to be sure.

willtebbutt · August 16, 2020, 2:25pm

Good point – I hadn’t considered using mutation. It’s definitely an option, but I’ll hold off on marking it as a solution as a non-mutating solution would also be nice.

willtebbutt · August 16, 2020, 2:42pm

Thanks for pointing me towards Transducers.jl – it looks like its ScanEmit transducer is actually what I want. It has roughly the same behaviour as not_quite_my_accumulate:

julia> not_quite_my_accumulate(tuple, 1:3, 0)
3-element Array{Int64,1}:
 1
 2
 3

julia> collect(ScanEmit(tuple, 0), 1:3)
3-element Array{Int64,1}:
 0
 1
 2

tomerarnon · August 16, 2020, 2:56pm

You can also implement a custom iterator and keep track of the state that way. It generally only requires you overload length and iterate.

struct MyAccumulator{G, X, I<:Union{eltype(X), Nothing}}
    g::G
    xs::X
    init::I
end

MyAccumulator(g, xs) = MyAccumulator(g, xs, nothing)
Base.length(MA::MyAccumulator) = length(MA.xs)

function Base.iterate(MA::MyAccumulator{G, X, I}, state = (1, something(MA.init, one(I))))
    i, w = state
    i > length(MA.xs) && return nothing

    w, x = MA.g(w, @inbounds MA.xs[i])
    return (x, (i+1, w))
end


julia> collect(MyAccumulator((w,x) -> (w+x, w*x), [1,2,3,4]))
4-element Array{Any,1}:
  1
  4
 12
 28

Note that the return type is Array{Any} because eltype isn’t implemented (it’s a bit tricky since g can do anything, but you could look at how accumulate does it in Base to figure out a good general way).

I don’t think the interface above is necessarily the best for what you want, but maybe it can inspire a better direction.