Is `reduce` safe?

reduce (without init keyword) is much faster than foldl and foldr:

julia> v = rand(Int, 2^16); w = @view v[2:2:end];
julia> @btime reduce(+, $w);
  12.154 μs (0 allocations: 0 bytes)
julia> @btime foldl(+, $w);
  190.207 μs (0 allocations: 0 bytes)

The same applies to mapreduce. I believe the reason is that internally _mapreduce uses @inbounds:

EDIT: For arrays with more than 16 elements, it’s actually the function mapreduce_impl, which again uses @inbounds.

However, the indices array inds is initialized at the beginning of the function, so that the code won’t detect if the array A is modified via f or op. This may lead to an out-of-bounds access:

julia> v = [1,2,3,4];
julia> reduce(v) do x, y
           @show (x, y, v)
           empty!(v)
           x+y
       end
(x, y, v) = (1, 2, [1, 2, 3, 4])
(x, y, v) = (3, 3, Int64[])
(x, y, v) = (6, 4, Int64[])
10

For foldl and foldr this does not happen:

julia> v = [1,2,3,4];
julia> foldl(v) do x, y
           @show (x, y, v)
           empty!(v)
           x+y
       end
(x, y, v) = (1, 2, [1, 2, 3, 4])
3

I wonder whether this use of @inbounds is considered safe (or safe enough). It has obvious risks, but on the other hand one may argue that it is comparable to views. The docstring for view says:

The behavior is undefined if the shape of the parent array is changed after view is called because there is no bound check for the parent array; e.g., it may cause a segmentation fault.

If it is considered safe, then one could speed up foldl and foldr in the same way.

2 Likes

There’s another dimension: reduce doesn’t have to guarantee linear, in-order traversal. To quote the documentation:

Reduce the given collection itr with the given binary operator op. If provided, the initial value init must be a neutral element for op that will be returned for empty collections. It is unspecified whether init is used for non-empty collections.

For empty collections, providing init will be necessary, except for some special cases (e.g. when op is one of +, *, max, min, &, |) when Julia can determine the neutral element of op.

Reductions for certain commonly-used operators may have special implementations, and should be used instead: maximum(itr), minimum(itr), sum(itr), prod(itr), any(itr), all(itr). There are efficient methods for concatenating certain arrays of arrays by calling reduce(vcat, arr) or reduce(hcat, arr).

The associativity of the reduction is implementation dependent. This means that you can’t use non-associative operations like - because it is undefined whether reduce(-,[1,2,3]) should be evaluated as (1-2)-3 or 1-(2-3). Use foldl or foldr instead for guaranteed left or right associativity.

Some operations accumulate error. Parallelism will be easier if the reduction can be executed in groups. Future versions of Julia might change the algorithm. Note that the elements are not reordered if you use an ordered collection.

With that said, it does seem useful to do some digging into the use of @inbounds as a source of performance optimizations versus it being a source of “iterator invalidation” style issues like you’re pointing out.

1 Like

We should be able to get rid of @inbounds here on recent versions of Julia.
Demo:

julia> function mymapreduce(f, op, A::AbstractArray)
           inds = LinearIndices(A)
           n = length(inds)
           if n == 0
               throw(BoundsError("must have n >= 1"))
           else
               i = first(inds)
               fa1 = f(A[i]); n == 1 && return fa1
               fa2 = f(A[i+=1])
               s = op(fa1, fa2)
               while i < last(inds)
                   Ai = A[i+=1]
                   s = op(s, f(Ai))
               end
               return s
           end
       end
mymapreduce (generic function with 1 method)

julia> function mymapreduce_inbounds(f, op, A::AbstractArray)
           inds = LinearIndices(A)
           n = length(inds)
           if n == 0
               throw(BoundsError("must have n >= 1"))
           else
               i = first(inds)
               @inbounds fa1 = f(A[i]); n == 1 && return fa1
               @inbounds fa2 = f(A[i+=1])
               s = op(fa1, fa2)
               while i < last(inds)
                   @inbounds Ai = A[i+=1]
                   s = op(s, f(Ai))
               end
               return s
           end
       end
mymapreduce_inbounds (generic function with 1 method)

julia> x = rand(256);

julia> @btime mymapreduce(identity, Base.FastMath.add_fast, $x)
  32.905 ns (0 allocations: 0 bytes)
128.7068922942042

julia> @btime mymapreduce_inbounds(identity, Base.FastMath.add_fast, $x)
  32.905 ns (0 allocations: 0 bytes)
128.7068922942042

julia> @btime sum($x)
  39.859 ns (0 allocations: 0 bytes)
128.7068922942042

I would suggest the improvement of using Base.promote_op to figure out the accumulator type, and avoid doing any calculations outside of the loop.
The current implementation messes with alignment and increases code size.

For Array the compiler seems to be able to optimize the bounds check away. That’s why I used a SubArray in my initial example:

julia> v = rand(Int, 2^16); w = @view v[2:2:end];
julia> @btime mymapreduce(identity, +, $w);
  173.185 μs (0 allocations: 0 bytes)
julia> @btime mymapreduce_inbounds(identity, +, $w);
  11.797 μs (0 allocations: 0 bytes)

Tested on master 1.10.0-DEV.680 (2023-03-01).

1 Like

Hmm. I think we should fix that so it works with SubArray. Perhaps by passing information that the SubArray’s slice is within the parent array’s.

Note that #48720 is a fix for FastSubArray, and #48803 is work in progress for “reversed” views with step value -1. Having it for general SubArray would be great.

That’s true, but the default method that uses @inbounds works through the array from beginning to end.

I think the issue to were pointing to is about dictionaries. I believe that changing or deleting the keys of a Dict may lead to errors, but not to segmentation faults, unlike the out-of-bounds access of an array. Please correct me if I’m wrong.

That’s true – I drew an analogy to iterator invalidation because both issues are part of an abstract class in which code operates on a container with an assumption that mutation will not take place and that code will not fail if mutation does take place. This is a class of problems that Julia is not currently able to prevent in general, so there is a tradeoff between exposing unsafe constructs and providing maximum performance.

You are right that in one case there is a segfault and the other there is not, but I’m not sure that distinction matters as much as the core problem of “a piece of code assumes no mutation but Julia can’t guarantee that”.