Accurate summation algorithm

simonbyrne · December 1, 2017, 7:45pm

x86 architecture has essentially 2 parallel sets of floating point functionality:

the x87 instructions (e.g. FMUL), which descend from the original 8087 coprocessor. These have 80-bit registers, and the original intention was that all computations would be done at higher precision, truncating only when spilling the registers. The problem is that the results were dependent on register allocation (unless you spill after every operation), so an option was later added to reduce the operational precision to double or single precision. This behaviour was later proscribed by the C99 specification.
the SSE2 and later instructions which defined specific instructions for single and double operations (e.g. MULSS, MULSD), along with SIMD variants (e.g. MULPS, MULPD). These are typically much faster, and also support nice features like pipelining, and in later additions, FMAs.

SSE2+ instructions are basically used by all modern compilers unless you explicitly use long double types (though some recent-ish versions of gcc will still use x87 by default on 32-bit machines).

x87 also defined instructions for certain special functions (e.g. FSIN), though it turns out to be faster (and in some cases, more accurate) to compute these in software. The only exception is the remainder functions (e.g. fmod) which often still use x87 underneath.

Topic		Replies	Views
Julia equivalent of Python's "fsum" for floating point summation General Usage python	54	7584	August 16, 2019
Speed issue with KahanSummation Performance question , package , parallel	12	1310	February 5, 2022
High precision summation functions not compatible with Optim.jl with autodiff=:forward Optimization (Mathematical) package , optim , forwarddiff	9	1025	February 5, 2022
Sum of float64 vector gives slightly incorrect answer Performance question	44	5330	April 29, 2019
Optimizing sums of products (dot products) Performance simd , linearalgebra , sum	17	799	September 24, 2024

Accurate summation algorithm

Related topics