Imprecision of mean over iterators of large vectors


#1

To wit:

julia> N = 100000000; aa = rand(Float32, N);

julia> mean((x for x in aa))
0.16777216f0

julia> mean(aa)
0.500059f0

I know what’s happening: Float32 has limited accuracy, and mean over an iterator computes a “naive” sum, which saturates at some point. However, it caught me by surprise in my real-world scenario:

julia> mean(skipmissing(Umat))
1.0638367f0 V

julia> mean(filter(!ismissing, Umat))
3.1320891f0 V

That’s a bit scary. Is that just Something I Must Watch Out For? It feels like there should be a reasonable algorithm that gives good accuracy without sacrificing speed. Something like

function sum(iter)
    total = zero(eltype(iter))
    minitotal = zero(eltype(iter))
    i = 0
    for x in iter
        if (i+=1) % 100000 == 0
            total += minitotal
            minitotal = zero(eltype(iter))
        end
        minitotal += x
    end
    return total + minitotal
end

I’m not particularly good with numerical methods, but wouldn’t that be similar in efficiency, with much better accuracy?


Can the compiler automatically optimize this code?
#2

Yes, this is a big topic. For example, see https://en.wikipedia.org/wiki/Kahan_summation_algorithm . The Julia standard library has alternative summation implementation rules for this.


#3

That’s the algorithm used by sum(::Vector), right? Why can’t it be used with an iterable/generator?


#4

That’s a great example of the problem of sequential summation.

For arrays, Julia uses a hybrid sequential/pairwise approach for summation (using sequential for small chunks, then accumulating these chunks in a pairwise manner), which is why sum(aa) avoids this problem.

However there’s no reason why we couldn’t do the same thing for general iterators: could you open an issue?


#5

Not sure about general iterables (the algorithm would have to be adapted since we can’t access items at arbitrary positions like for arrays), but for skipmissing we could definitely do that.


#6

A direct implementation of pairwise summation requires random access, which is we we don’t use the pairwise algorithm for general iterators.

However, I suppose it might be possible to implement pairwise summation on a sequential iterator by allocating an explicit “stack” of O(log₂n) accumulators, and doing accumulation on the correct one as we go along.


#7

The current implementation does access them in order, it just uses the call stack as the storage stack. However you do need to know the length ahead of time (so you know how deep the stack needs to be).


#8

Not necessarily; you can just expand as necessary. Eg after 10 levels of depth are used up, add another 10. Someone just needs to code this, which can be tricky to do in a compiler-friendly way if the iterator has unknown eltype.