I noticed that the docs for reduce
say:
reduce(op, itr; [init])
...
If provided, the initial value
init must be a neutral element for op that will be returned for empty collections. It is unspecified
whether init is used for non-empty collections.
...
Examples
≡≡≡≡≡≡≡≡≡≡
julia> reduce(*, [2; 3; 4])
24
julia> reduce(*, [2; 3; 4]; init=-1)
-24
First it says that init
must be a neutral element for op (i.e. 1 for integer multiplication), and it may or may not be used for non-empty collections. Then there is an example with init = -1
, which is clearly not a neutral element, and this example relies on init
being used for non-empty collections.
So, what is the right contract for reduce
? Sometimes in my code I pass init
that is not a neutral element, and expect it to always be used. Is this a wrong assumption?
This is a bad example in the documentation.
I guess the situation is that currently and for integers, reduce
boils down to something like
function reduce(op, itr, init)
r = init
for x in itr
r = op(r,x)
end
return r
end
The point of the statement in the documentation is that you should not assume that this holds for other types or across different versions of Julia. So basically, don’t do what the example demonstrates or your code may break in unexpected ways.
I for one would vote in favour of replacing the example with
julia> reduce(*, [2; 3; 4])
24
julia> reduce(*, []) # Invalid: empty collection and no init
ERROR: MethodError: no method matching one(::Type{Any})
julia> reduce(*, []; init = 1)
1
The slightly tricky thing with this example is that reduce
is actually very good at inferring the neutral element. For example, both of the following calls work even though they violate the “contract” specified in the documentation.
julia> reduce(*, Int[])
1
julia> reduce(+, Int[])
0
When init
is provided, it will be used. Correction: no, that is only for foldl
/foldr
.
If the collection is empty, it must be provided.
Good to hear, now I know this behaviour can be relied upon!
Is this guaranteed? I guess doing so would make a lot of sense, but then the documentation should be updated.
I think I misspoke, the docs of reduce
says that
It is unspecified whether init is used for non-empty collections.
I would recommend foldr
/ foldl
instead for code that is not meant to be parallelized — it guarantees that init
will be used.
Ok then, I will change reduce
to foldl
when using non-neutral init
. But is there a reason why reduce
may not use init
for non-empty collections? It makes sense to use reduce
for associative operations instead of having to remember which of foldl
/foldr
is faster; also usages of foldl
in the code look like the author explicitly needs left associativity, while it may not be the case.
My understanding is that reduce
(which is really a thin wrapper for mapreduce
) leaves these things deliberately unspecified, so that the implementation is not constrained. Eg
reduce(+, [a, b, c, d]; init = z)
may be evaluated as one of
(a + b) + (c + d)
(z + a + b) + (c + d)
((a + b) + c) + d)
(((a + b) + c) + z) + d)
by a conforming implementation. The first one allows eg parallelization. For this, it is required that init
is the neutral element, ie
op(a, z) == op(z, a) == a
and that op
is associative.
All of this is mentioned in the docstring (except, of course, for the design rationale).
I understand that reduce
has this flexibility of operation reordering - clearly it gives more optimization flexibility. But it is less clear why init
may not be used. For associative operations it is thus required to use operation(init, reduce(operation, array, init=unity))
instead of reduce(operation, array, init=init)
when init
is not neutral, or switch to foldl
with potentially worse performance. Not a big issue, of course, but still.