Compilers, purity, and variance computation

There are plenty of articles on single-pass numerically stable algorithms for the variance, some of which are linked in the Wikipedia article I linked above.

Because a stable two-pass calculation of the variance uses the result of the first sum (the mean) in the second sum (the variance).

It might be nice to have a meanstd(x) algorithm in Statistics.jl that returns both.

(Statistics.jl does provide a stable single-pass (“streaming”) std algorithm that it uses for iterables that maybe only allow you to loop over them once.)

If it fits in the cache then you don’t need to reverse the order either.

Nope, Kahan summation doesn’t help here. Even if the \sum x_k and \sum x_k^2 are exactly rounded — which is better than Kahan and requires something like Xsum.jl — you can still lose all the significant digits when you subtract them.

Nope. If you round to a posit on each operation you still lose associativity (because the order of the rounding changes). Gustafson only claims associativity for “quire” arithmetic that uses arbitrary precision.

But I don’t want to go down the rabbit-hole of debating the promises of posits here. We’ve already had long threads on this, e.g.: Posits - a new approach could sink floating point computation - #17 by simonbyrne

2 Likes