Reduce: init for non-empty collections

I noticed that the docs for reduce say:

reduce(op, itr; [init])

  If provided, the initial value
  init must be a neutral element for op that will be returned for empty collections. It is unspecified
  whether init is used for non-empty collections.


  julia> reduce(*, [2; 3; 4])
  julia> reduce(*, [2; 3; 4]; init=-1)

First it says that init must be a neutral element for op (i.e. 1 for integer multiplication), and it may or may not be used for non-empty collections. Then there is an example with init = -1, which is clearly not a neutral element, and this example relies on init being used for non-empty collections.
So, what is the right contract for reduce? Sometimes in my code I pass init that is not a neutral element, and expect it to always be used. Is this a wrong assumption?

This is a bad example in the documentation.

I guess the situation is that currently and for integers, reduce boils down to something like

function reduce(op, itr, init)
    r = init
    for x in itr
       r = op(r,x)
    return r

The point of the statement in the documentation is that you should not assume that this holds for other types or across different versions of Julia. So basically, don’t do what the example demonstrates or your code may break in unexpected ways.

I for one would vote in favour of replacing the example with

julia> reduce(*, [2; 3; 4])

julia> reduce(*, [])  # Invalid: empty collection and no init
ERROR: MethodError: no method matching one(::Type{Any})

julia> reduce(*, []; init = 1)

The slightly tricky thing with this example is that reduce is actually very good at inferring the neutral element. For example, both of the following calls work even though they violate the “contract” specified in the documentation.

julia> reduce(*, Int[])

julia> reduce(+, Int[])

When init is provided, it will be used. Correction: no, that is only for foldl/foldr.

If the collection is empty, it must be provided.

Good to hear, now I know this behaviour can be relied upon!

Is this guaranteed? I guess doing so would make a lot of sense, but then the documentation should be updated.

I think I misspoke, the docs of reduce says that

It is unspecified whether init is used for non-empty collections.

I would recommend foldr / foldl instead for code that is not meant to be parallelized — it guarantees that init will be used.

Ok then, I will change reduce to foldl when using non-neutral init. But is there a reason why reduce may not use init for non-empty collections? It makes sense to use reduce for associative operations instead of having to remember which of foldl/foldr is faster; also usages of foldl in the code look like the author explicitly needs left associativity, while it may not be the case.

My understanding is that reduce (which is really a thin wrapper for mapreduce) leaves these things deliberately unspecified, so that the implementation is not constrained. Eg

reduce(+, [a, b, c, d]; init = z)

may be evaluated as one of

(a + b) + (c + d)
(z + a + b) + (c + d)
((a + b) + c) + d)
(((a + b) + c) + z) + d)

by a conforming implementation. The first one allows eg parallelization. For this, it is required that init is the neutral element, ie

op(a, z) == op(z, a) == a

and that op is associative.

All of this is mentioned in the docstring (except, of course, for the design rationale).

I understand that reduce has this flexibility of operation reordering - clearly it gives more optimization flexibility. But it is less clear why init may not be used. For associative operations it is thus required to use operation(init, reduce(operation, array, init=unity)) instead of reduce(operation, array, init=init) when init is not neutral, or switch to foldl with potentially worse performance. Not a big issue, of course, but still.