How to `cumsum` a tuple

Julia has a cumsum function that operators on arrays, but not on lists. This seems to work:

cumsum2(itr) = foldl((x,y)->(x...,x[end]+y), itr)

Is this the best way to accomplish the task?

You can collect the iterator into an array: cumsum(collect(itr)).

I’m not sure why there is no cumsum(itr) definition.

2 Likes

Thanks, @yurvish. Yes, that works, too. I’m hoping for a solution that returns a list, mainly because I’m trying to better understand how to use functions like foldl. But to be honest, I’m still unsure about when I should be using lists, and when I should be using arrays.

What do you mean by lists? That’s not a standard piece of terminology in the Julia world.

Is tuple the right terminology?

Tuple support for cumsum was just merged:

https://github.com/JuliaLang/julia/pull/34654

6 Likes

Probably that’s what you wanted to use. Julia creates a tuple with (x...,):

julia> x = [1, 2, 3]
3-element Array{Int64,1}:
 1
 2
 3

julia> y = (x...,)
(1, 2, 3)

julia> typeof(y)
Tuple{Int64,Int64,Int64}

If you are unsure if you need tuples or arrays, I think using arrays (vectors) is OK for most cases. You’d need tuples when you want to improve your program (e.g., performance).

2 Likes

BTW, you were very close to defining cumsum. One way to finish what you wrote is to use init=(0,) and then remove it after foldl is done:

julia> cumsum2(itr) = foldl((x,y)->(x...,x[end]+y), itr; init=(0,))[2:end]
cumsum2 (generic function with 1 method)

julia> cumsum2((1, 2, 3))
(1, 3, 6)
2 Likes

Seems like the init=(0,) option isn’t needed:

cumsum2(itr) = foldl((x,y)->(x...,last(x)+y), itr)
julia> cumsum2((1,2,3))
(1, 3, 6)

@tkf, you mentioned the possibility of improved performance using Tuples. Where might I read more about this?

Note that your implementation works only when the elements of itr are numbers (although it’s also the case with my init=(0,) version). What I wanted to illustrate was that you should use init most of the time when using foldl with custom op, and specifying it could be tricky sometimes. If you want to make it work with something other than numbers, e.g., cumsum2(([1], [2], [3])), I think you’d need to do something like what I did in the PR.

https://docs.julialang.org/en/latest/manual/performance-tips/ is a great resource for performance tips in general. But it looks like it doesn’t have much for tuples. The parts mentioning type-stability is somewhat relevant. For example, Julia can infer the type of last((1, "a")) but not last([1, "a"]). Another reason why tuple could be useful for performance is that it can be stack-allocated. You can probably find something relevant if you dig into StaticArray-vs-Array or struct-vs-mutable struct.

1 Like

If you come from the lazy functional programming perspective, the usual tricks of defining a function on lists recursively work quite well with small tuples in julia. You can use first and Base.tail to split a tuple into the first element and the remaining ones. Using multiple dispatch, you can have separate procedures for a general tuple (t::Tuple) and for the empty tuple (t::Tuple{}).

For example here you could do:

# we cheat by adding an init argument, as it makes the recursion easier
function cumsum2(init, t::Tuple)
    f, ls = first(t), Base.tail(t)
    return (init, cumsum2(init + f, ls)...)
end

cumsum2(init, ::Tuple{}) = (init,)

cumsum2(t::Tuple) = cumsum2(first(t), Base.tail(t))
cumsum2(::Tuple{}) = () # we need this as first(t) would error on the empty tuple

This will be extremely performant on small tuples (for constant inputs, the computation even happens at compilation time), but you may need to be careful that compilation times become problematic for large tuples.

1 Like

Thanks for the helpful responses, @tkf and @piever!

This will be extremely performant on small tuples (for constant inputs, the computation even happens at compilation time), but you may need to be careful that compilation times become problematic for large tuples.

Roughly, what qualifies as a “small” tuple?

I think in julia Base generally they use this trick only for tuples up to 16 elements, so I guess that’s a reasonable cut off. If it’s much longer than that, you are probably better off using arrays.