Does the julia intrinsic sum() apply fastmath by default?

Ender_L · February 17, 2023, 11:51pm

I noticed the output of sum() is different from custom loops

julia> function mysum(A)
           out = 0.0
           for ii in eachindex(A)
               out += A[ii]
           end
           return out
       end
mysum (generic function with 1 method)

julia> A=rand(1000);

julia> sum(A)-mysum(A)
-1.1368683772161603e-13

In FORTRAN(gfortran) this only happens when -ffastmath is enabled. Is this by design, or there is something else I missed? If this is expected, what other functions have this behavior?

Oscar_Smith · February 17, 2023, 11:57pm

Julia’s sum uses pairwise summation which is more accurate and usually faster than the regular loop.

DNF · February 18, 2023, 12:02am

More accurate, yes. But faster? My understanding is that it is ‘almost as fast’.

Oscar_Smith · February 18, 2023, 12:03am

Theoretically it should be able to be faster for arrays that fit in cache since there are better dependency chains.

DNF · February 18, 2023, 12:23am

How short is that then? When testing, I find this loop to be consistently faster than sum for lengths from 10^1 to 10^6:

function mysum(xs)
    val = zero(eltype(xs))
    @simd for i in eachindex(xs)
        @inbounds val += xs[i]
    end
    return val
end

It’s a small difference, but consistent.

Compared to a non-simd loop, yes, that would be expected.

giordano · February 18, 2023, 12:43am

Note that with @simd you’re allowing reordering of the loop, which defeats the premise of Oscar’s statement.

Ender_L · February 18, 2023, 12:52am

Good to know. Thanks.
And why does sum() sometimes not agree with itself either?

julia> function rowsum(A)
           out = Array{Float64,2}(undef, size(A, 1),1)
           for ii in axes(A, 1)
               out[ii,1] = sum(A[ii, :])
           end
           return out
       end
rowsum (generic function with 1 method)

julia> A=rand(10,1000);

julia> sum(A,dims=2)-rowsum(A)
10×1 Matrix{Float64}:
  2.2737367544323206e-13
  2.2737367544323206e-13
  4.547473508864641e-13
 -2.2737367544323206e-13
  7.389644451905042e-13
  1.7053025658242404e-13
 -2.8421709430404007e-13
 -2.2737367544323206e-13
 -5.115907697472721e-13
  5.684341886080802e-14

jling · February 18, 2023, 3:50am

because sum(A, dims=2) (when you have dims) is not implemented like that, it’s more like a naive sum

Ender_L · February 18, 2023, 7:05pm

By naive sum do you mean like the loop I wrote at the start?

adienes · February 18, 2023, 7:15pm

In the case where you have a large number of similar-magnitude integers (so their sum is BigInt territory) I think pairwise summation is asymptotically faster than sequential

tbeason · February 18, 2023, 7:15pm

That is interesting to me. I would have thought that base sum always used Kahan summation, at least on items with known size. I wonder why that is not the case.

Oscar_Smith · February 18, 2023, 7:20pm

Kahan summation is roughly 2x slower.

tbeason · February 18, 2023, 7:27pm

Ah sorry, I tend to always equate the term pairwise sum with Kahan sum. Anyways, what I meant is I’m wondering why summing along other dimensions would be any different than a standard sum.

Oscar_Smith · February 18, 2023, 7:42pm

Not entirely sure. My guess would be that it’s just that it’s easier to write it as a maprecuce than writing the code to do pairwise summation over arbitrary dimensions.

stevengj · February 18, 2023, 8:00pm

In general, the reduction operations in Julia (mapreduce, sum, …) are documented to treat the operator as associative, and to use an implementation-dependent associativity. If you want a particular associativity you have to use foldl/foldr or similar (or write your own loop).

Ender_L · February 21, 2023, 4:05am

Good to know. I wasn’t aware this apply to not only map but also others.

Topic		Replies	Views
A fast sum. Any downsides? Performance sum	18	964	December 16, 2024
How to improve performance of sum() Performance question , fast-math , tullio , loopvectorization , sum	19	4852	July 19, 2021
Julia equivalent of Python's "fsum" for floating point summation General Usage python	54	7294	August 16, 2019
Why the function sum1 is faster than builtin sum Performance question , profiling	14	1041	September 21, 2023
Why is sum slower than multiplying by vector of ones? Performance	4	477	September 28, 2023

Does the julia intrinsic sum() apply fastmath by default?

Related topics