Does the julia intrinsic sum() apply fastmath by default?

I noticed the output of sum() is different from custom loops

julia> function mysum(A)
           out = 0.0
           for ii in eachindex(A)
               out += A[ii]
           end
           return out
       end
mysum (generic function with 1 method)

julia> A=rand(1000);

julia> sum(A)-mysum(A)
-1.1368683772161603e-13

In FORTRAN(gfortran) this only happens when -ffastmath is enabled. Is this by design, or there is something else I missed? If this is expected, what other functions have this behavior?

Julia’s sum uses pairwise summation which is more accurate and usually faster than the regular loop.

8 Likes

More accurate, yes. But faster? My understanding is that it is ‘almost as fast’.

Theoretically it should be able to be faster for arrays that fit in cache since there are better dependency chains.

How short is that then? When testing, I find this loop to be consistently faster than sum for lengths from 10^1 to 10^6:

function mysum(xs)
    val = zero(eltype(xs))
    @simd for i in eachindex(xs)
        @inbounds val += xs[i]
    end
    return val
end

It’s a small difference, but consistent.

Compared to a non-simd loop, yes, that would be expected.

Note that with @simd you’re allowing reordering of the loop, which defeats the premise of Oscar’s statement.

4 Likes

Good to know. Thanks.
And why does sum() sometimes not agree with itself either?

julia> function rowsum(A)
           out = Array{Float64,2}(undef, size(A, 1),1)
           for ii in axes(A, 1)
               out[ii,1] = sum(A[ii, :])
           end
           return out
       end
rowsum (generic function with 1 method)

julia> A=rand(10,1000);

julia> sum(A,dims=2)-rowsum(A)
10×1 Matrix{Float64}:
  2.2737367544323206e-13
  2.2737367544323206e-13
  4.547473508864641e-13
 -2.2737367544323206e-13
  7.389644451905042e-13
  1.7053025658242404e-13
 -2.8421709430404007e-13
 -2.2737367544323206e-13
 -5.115907697472721e-13
  5.684341886080802e-14

because sum(A, dims=2) (when you have dims) is not implemented like that, it’s more like a naive sum

1 Like

By naive sum do you mean like the loop I wrote at the start?

In the case where you have a large number of similar-magnitude integers (so their sum is BigInt territory) I think pairwise summation is asymptotically faster than sequential

That is interesting to me. I would have thought that base sum always used Kahan summation, at least on items with known size. I wonder why that is not the case.

Kahan summation is roughly 2x slower.

3 Likes

Ah sorry, I tend to always equate the term pairwise sum with Kahan sum. Anyways, what I meant is I’m wondering why summing along other dimensions would be any different than a standard sum.

Not entirely sure. My guess would be that it’s just that it’s easier to write it as a maprecuce than writing the code to do pairwise summation over arbitrary dimensions.

In general, the reduction operations in Julia (mapreduce, sum, …) are documented to treat the operator as associative, and to use an implementation-dependent associativity. If you want a particular associativity you have to use foldl/foldr or similar (or write your own loop).

4 Likes

Good to know. I wasn’t aware this apply to not only map but also others.