accumulate (a.k.a. scan, prefix etc) is great, but I’ve got a situation in which I need something very slightly different, and wonder whether anyone knows whether
it has a name
someone has already implemented it
Specifically, as I understand it accumulate eats a binary function f and an iterator xs and does something that’s roughly equivalent to (but presumably more clever than) the following:
function my_accumulate(f, xs, init)
output = Vector{typeof(init)}(undef, length(xs))
y = init
for (n, x) in enumerate(xs)
y = f(y, x)
output[n] = y
end
return output
end
Now, I’ve got a function g that outputs a length 2 Tuple where I want to return the first element as part of output and pass the second on to the next iteration. Again, writing the for loop, something like this:
function not_quite_my_accumulate(g, xs, init)
output = Vector{typeof(init)}(undef, length(xs))
w = init
for (n, x) in enumerate(xs)
w, y = g(w, x)
output[n] = y
end
return output
end
In general one could implement not_quite_my_accumulate using accumulate by storing both w and y in output / passing them both on to the next iteration and then playing around with the result, but in my application it tends to be the case that y is quite small while w is rather large. Consequently I can’t afford to store all the intermediate values of w.
I couldn’t see anything about this in the docs, and I’m not sure whether it’s a function that has a name, so any advice would be much appreciated!
I guess there are a couple of tricks that can be useful. If w is type-stable, you could create a structure that encodes w and updates it as you iterate.
julia> mutable struct WithState{F, S} # somewhat similar to `Flux.Recur`
f::F
state::S
end
julia> function (st::WithState)(x)
w, y = st.f(st.state, x)
st.state = w
return y
end
julia> map(WithState(g, init), xs) # not 100% sure that `map` guarantees to be run sequentially, though in practice I think it does
If you want to be efficient while allowing w to change type, you have to play around with foldl. For example, you could do
julia> res, _ = foldl(xs, init=(Union{}[], init)) do (vec, w), x
ŵ, y = g(w, x)
return push!!(vec, y), ŵ # `push!!` allows widening of the vector
end
In general, playing around with the function you pass to foldl is a key idea for Transducers (or at least, that’s my understanding). Maybe the above could somehow be done as a transducer, but I don’t understand the details of the framework well enough to be sure.
Good point – I hadn’t considered using mutation. It’s definitely an option, but I’ll hold off on marking it as a solution as a non-mutating solution would also be nice.
Thanks for pointing me towards Transducers.jl – it looks like its ScanEmit transducer is actually what I want. It has roughly the same behaviour as not_quite_my_accumulate:
You can also implement a custom iterator and keep track of the state that way. It generally only requires you overload length and iterate.
struct MyAccumulator{G, X, I<:Union{eltype(X), Nothing}}
g::G
xs::X
init::I
end
MyAccumulator(g, xs) = MyAccumulator(g, xs, nothing)
Base.length(MA::MyAccumulator) = length(MA.xs)
function Base.iterate(MA::MyAccumulator{G, X, I}, state = (1, something(MA.init, one(I))))
i, w = state
i > length(MA.xs) && return nothing
w, x = MA.g(w, @inbounds MA.xs[i])
return (x, (i+1, w))
end
julia> collect(MyAccumulator((w,x) -> (w+x, w*x), [1,2,3,4]))
4-element Array{Any,1}:
1
4
12
28
Note that the return type is Array{Any} because eltype isn’t implemented (it’s a bit tricky since g can do anything, but you could look at how accumulate does it in Base to figure out a good general way).
I don’t think the interface above is necessarily the best for what you want, but maybe it can inspire a better direction.