Julia equivalent of Python's "fsum" for floating point summation

I have been looking for a Julia equivalent of Python’s fsum and have thus far been unable to find one. Is there an implementation already available? If not I will write my own and put it on Github for others to use.

From the Python docs:

math.fsum ( iterable )

Return an accurate floating point sum of values in the iterable. Avoids loss of precision by tracking multiple intermediate partial sums:

The algorithm’s accuracy depends on IEEE-754 arithmetic guarantees and the typical case where the rounding mode is half-even. On some non-Windows builds, the underlying C library uses extended precision addition and may occasionally double-round an intermediate sum causing it to be off in its least significant bit.

For further discussion and two alternative approaches, see the ASPN cookbook recipes for accurate floating point summation.

There was once a Kahan summation function in Base, but it got split off to a package prior to 1.0: GitHub - JuliaMath/KahanSummation.jl: Sum and cumulative sum using the Kahan-Babuska-Neumaier algorithm

1 Like

Maybe this is relevant: https://github.com/dpsanders/IntervalArithmetic.jl

1 Like

My understanding is that Python’s fsum function doesn’t use Kahan summation, it uses an algorithm by Shewchuk that guarantees an exactly rounded result (i.e. the result as if you did the sum in infinite precision and then rounded it to the closest floating-point value).

Kahan summation, in contrast, doesn’t guarantee an exactly rounded result, but its error (relative to the L₁ norm of the input) is bounded independent of the number of terms. The default sum function in Julia is also quite accurate because it uses pairwise summation, with an error bound that grows only as the log of the number of terms.

(My understanding is that Shewchuk’s algorithm is especially useful if you want a small relative error even for ill-conditioned sums: sums of both positive and negative values where the result is nearly zero due to cancellations. I’ve never used the algorithm myself, though.)

11 Likes

See also Radford Neal’s paper on exact summation with superaccumulators:

If someone wants to implement exact summation, it might be worth comparing different approaches.

3 Likes

Someone may want to contact the author to ask for the C code under an MIT or BSD license, as otherwise translating the code from the paper may be problematic.

Or implement the algorithms without copying the code.

1 Like

I wrote Radford, and asked about use of the C code under the MIT or BSD License.

2 Likes

There’s also the possibility of one person transpiling the code from the paper to pseudo code and someone else implementing the algorithm based solely on the pseudo code (without reading the original implementation!) - that way, no explicit copying is going on, which is the part that would be problematic.

I’ve not read the paper yet, so I’d be happy to do the second part sometime next week if someone can provide a pseudo code version of this.

That’s the most paranoid version of GPL avoidance, but it’s not really legally necessary. As long as you don’t derive your code from GPL code, you can license your own code however you want. Of course the question is what counts as “deriving” your code from GPL code? Do you have to explicitly copy or translate it? Or can you be subconsciously influenced by GPL code you’ve seen but not consciously copied?

1 Like

Not my idea, I’ve only described the technique as told here - as far as I understand, any derivation of GPL’d code has to be GPL’d itself. By describing the algorithm with various sources without directly translating it to make it as easy as possible to reimplement it, it’s possible to (somewhat) circumvent this.

IANAL though, so this might just be bonkers :smiley:

Thanks for the feedback everyone — looks like I have plenty of alternatives to choose from at the moment.

The default sum function in Julia is also quite accurate because it uses pairwise summation

I didn’t realize the regular sum function uses pairwise summation, this will probably be adequate for my present requirements. I need to hand in a PhD thesis in the 2 months time so it sounds like someone else will get to implementing Shewchuck’s (or Neal’s) algorithm before me. If not I will look at it again at that time.

EDIT : Out of curiosity, would this fsum equivalent belong in the stdlib or somewhere else?

Please do not do that. Translating GPL code to pseudocode that “looks suspiciously like Python” and then having someone else translate that pseudocode into actual Python (or in this case Julia) is far more legally dicey than just reading the paper and implementing the algorithm based on the description while disregarding the GPL code that’s included. Translating twice does not break the “derived work link”—you can translate through as many different programming languages as you want and it’s still a derived work.

4 Likes

What better time for procrastination? :grimacing:

14 Likes

(Note that the published code by Radford is not GPL—in fact, it has no license statement at all, which means that you cannot redistribute it, or code derived from it, in any form. Clean-room re-implementation is not about “GPL avoidance,” it’s about working around copyright law’s definition of “derived work.” In this case, the paper itself contains code excerpts, so it is especially tricky to re-implement the algorithm in a clean way from a copyright perspective.)

The author emailed me to say he was planning on posting a cleaned-up version of the code under a free/open-source license (his initial impulse was GPL, but I’ve urged him to consider a more permissive license for a small library routine like this). Not sure about the timeframe.

8 Likes

5 posts were split to a new topic: Copyright issues for code excerpts

A Kahan / Pairwise polyalgorithm seems like it could still be pretty fast, it might be +15*3 operations for summing 256 numbers depending on where you switch. Not sure how much extra precision you’d get

Pairwise summation is the default in Julia; it is just as fast as naive summation and much more accurate. Kahan summation is even more accurate but significantly slower, which is why it is not the default.

1 Like

It’s a pretty neat trick too, and definitely a hard default to argue against. I was just noticing that each pass of the pairwise sum makes a partial Kahan sum on the remaining values 50% cheaper

Another approach

function finiteDif_sum(arr, d= 0.0)
  x = 0.0; y=0.0;
  y += d;
  for i in arr
    x = x + x-y + d + i;  ## +y-x -d ?
    y = y + i + y + i - x - d; ## and + x - y - i + d ?
    end;
  [x,y - d]
  end;

Add- might’ve gotten the x-y around the wrong way

I have posted a copy of Radford’s paper with all C code removed.
https://github.com/JeffreySarnoff/SuperaccumulatorPaperNoC/blob/master/superaccumulator.pdf

2 Likes