In the code below, var(M) is not 0, I guess the reason might be computation error, but I wonder if anyone knows how to deal with this? Thanks in advance.

```
M = fill(5.421241248937501521, (1,10000))
M = Float64.(M)
var(M)
var(M[:,100:10000])
```

In the code below, var(M) is not 0, I guess the reason might be computation error, but I wonder if anyone knows how to deal with this? Thanks in advance.

```
M = fill(5.421241248937501521, (1,10000))
M = Float64.(M)
var(M)
var(M[:,100:10000])
```

It’s just floating-point roundoff error.

```
julia> var(M)
3.865805016084566e-29
```

3 Likes

The limited precision of `Float64`

is even relevant just for parsing the literal:

```
julia> 5.421241248937501521
5.421241248937502
```

So it’s no surprise that scaling a large sum of values to compute the mean then scaling a large sum of squares of deviations from the mean to compute the variance will deviate. In this case, you can reduce the number crunching by informing `var`

that the mean is exactly the one value filling the entire matrix, so it’ll skip the sum for that part:

```
julia> var(M; mean=M[begin])
0.0
```

Now that all the deviations from the mean are exactly `0.0`

, those will square, sum, and scale down to exactly `0.0`

as well. For cases where you don’t just have the one particular value, you just have to work with the limited precision in mind and round to an accepted precision; it’s the same reason why exact comparisons `== 0.0`

are far rarer than approximate ones.

It is possible for a type to far more efficiently represent an array filled with 1 particular value and for `var`

to specially dispatch on said type to skip all the number crunching to getting the matching type of zero. FillArrays.jl implements this:

```
julia> using FillArrays, Statistics
julia> M2 = Fill(5.421241248937501521, 1, 10000)
1×10000 Fill{Float64}, with entries equal to 5.421241248937502
julia> fieldnames(typeof(M2))
(:value, :axes)
julia> M2.value # only stores 1 value, not 10000 copies
5.421241248937502
julia> var(M2)
0.0
julia> @which var(M2)
var(A::FillArrays.AbstractFill{T}; corrected, mean, dims) where T<:Number
@ FillArraysStatisticsExt C:\...\.julia\packages\FillArrays\...\ext\FillArraysStatisticsExt.jl:17
```

3 Likes

First, here are the results in the REPL.

```
julia> M = fill(5.421241248937501521, (1,10000))
1×10000 Matrix{Float64}:
5.42124 5.42124 5.42124 … 5.42124 5.42124
julia> M = Float64.(M)
1×10000 Matrix{Float64}:
5.42124 5.42124 5.42124 … 5.42124 5.42124
julia> var(M)
1.333308260649575e-28
julia> var(M[:,100:10000])
1.5463235527558329e-28
```

The main issue here is the calculation of the mean.

```
julia> mean(M)
5.42124124893749
julia> M[1] - mean(M)
1.1546319456101628e-14
julia> eps(M[1])
8.881784197001252e-16
```

Fortunately, if we can provide an accurate mean, then we can calculate accurate variance.

```
julia> var(M; mean=M[1])
0.0
```

This suggests an extended approach for this situation:

```
julia> ou_var(X) = try
_ou_mean = only(unique(X))
var(X; mean=_ou_mean)
catch _
var(X)
end
ou_var (generic function with 1 method)
julia> ou_var(M)
0.0
```

2 Likes

As already noted you have a floating-point roundoff error already in the mean computation. If you’re willing to spend more calculation time you can improve on this with a correction term:

```
julia> M[1]
5.421241248937502
julia> μ₁ = mean(M)
5.421241248937496
julia> μ₂ = μ₁ + mean(M .- μ₁)
5.421241248937502
```

and you can integrate this correction into the variance computation simply by

```
julia> var(M .- mean(M))
0.0
```

2 Likes

Calculating variance accurately can take a bit of care. In particular, `@GunnarFarneback`

’s reccomendation looks pretty good *(EDIT: although the builtin algorithm is already quite reliable, so unless this nonzero-for-constant-inputs is a problem, I would just use var as-is)*. If you want a cheaper but marginally less accurate calculation,

`var(M .- first(M))`

(or offset by any other convex combination of elements of `M`

) is also decent (not only in this case).It appears that `Statistics`

uses the Welford algorithm when the mean is not provided and the target is an general iterator. For `AbstractArrays`

, however, it uses a two-pass algorithm. Obviously, the Welford algorithm is exact in this special case:

```
julia> var(M)
3.865805016084566e-29
julia> var(Iterators.take(M, length(M)))
0.0
```

I can’t say which should be preferred in general. Clearly, someone made the deliberate choice to use a two-pass algorithm for `AbstractArray`

. Given the two-pass’s cost, I must imagine it is slightly preferable in most situations.

Note that the built-in algorithms take this care, and are numerically stable.

In the example above, the backwards error is extremely small already. It’s not clear to me in what context this error is a **practical** problem, or if it’s just one of those cases where people are surprised by the existence of a round off error.

There are ways to get exactly rounded results, of course, but they are only worth it in rare circumstances.

6 Likes

To elaborate, the correction term `mean(M .- μ₁)`

works because it’s summing much smaller values (ideally 0 variance after all) than `mean(M)`

did and can thus represent the small error. The correction is less helpful with larger variances:

```
julia> using Statistics, Random
julia> function testmean(intended, offset, halfn::Integer)
M = shuffle!(vcat(fill(intended-offset, halfn),
fill(intended+offset, halfn)))
mu = mean(M)
correction = mean(M .- mu)
mu, correction, mu+correction
end
testmean (generic function with 1 method)
julia> testmean(5.3, 0, 5000)
(5.300000000000007, -7.105427357601002e-15, 5.3)
julia> testmean(5.3, 50, 5000) # can overcorrect
(5.3000000000000025, -3.819877747446298e-15, 5.299999999999999)
julia> testmean(5.3, 100, 5000) # doesn't correct at all
(5.299999999999995, 0.0, 5.299999999999995)
```