# Statistics.var wasting time calculating the mean?

I was trying to understand why Statistics.jl has both `var` and `varm` functions (since `var` already accepts a `mean` parameter). Doing `@less var([1,2,3])` I see:

``````var(A::AbstractArray; corrected::Bool=true, mean=nothing, dims=:) = _var(A, corrected, mean, dims)

_var(A::AbstractArray, corrected::Bool, mean, dims) =
varm(A, something(mean, Statistics.mean(A, dims=dims)); corrected=corrected, dims=dims)

_var(A::AbstractArray, corrected::Bool, mean, ::Colon) =
real(varm(A, something(mean, Statistics.mean(A)); corrected=corrected))
``````

Am I missing something here, or is the method recalculating the mean even when it is provided? Isn’t it wasteful?

Bonus question: I’m still not clear why Statistics exposes both `var` and `varm`

It looks like the default value for mean in the first method is nothing. It is eventually passed to

`varm(A, something(mean, Statistics.mean(A, dims=dims)); corrected=corrected, dims=dims)`

If the the first argument in `something` is `nothing`, the result is

``````julia> something(nothing, 2)
2
``````

if the first argument in `something` is not nothing, the result is:

``````julia> something(3, 2)
3
``````

So it appears that the mean is not recalculated if it is provided.

I’m not sure why there are two methods with the same functionality. My guess is that `varm` might be deprecated at some point, or might eventually be for internal use only. Good question.

1 Like

I believe these functions existed before Julia had keyword arguments.

3 Likes

Since `something` is a function and not control flow, I think it must evaluate both arguments, even if it then chooses to return the first. So to me it looks like it still recalculates the mean, which is surprising.

1 Like

This is quite easy to test for yourself:

``````julia> using Statistics, BenchmarkTools

julia> let v = randn(1_000)
@btime var(\$v)
@btime var(\$v, mean=\$(mean(v)))
@btime varm(\$v, \$(mean(v)))
@btime mean(\$v)
end;
197.689 ns (0 allocations: 0 bytes)
195.053 ns (0 allocations: 0 bytes)
117.394 ns (0 allocations: 0 bytes)
79.235 ns (0 allocations: 0 bytes)
``````

Yes, it’s quite clear that `var(v)` and `var(v; mean=meanv)` have essentially identical timings, whereas the difference between their times and the time for `varm(v, meanv)` is essentially exactly the time it takes to calculate `mean(v)`.

I’m sure a PR to Statistics.jl rectifying this would be quite welcome.

7 Likes

I assumed that `something` contained control flow logic to prevent unnecessary evaluations. Good catch!

Edit: Oh man. I just realized that control flow doesn’t even matter because the mean has to be calculated before it’s passed to `something`. Perhaps that’s why it was lurking there for years.

1 Like

I believe this is the result of constant prop of `nothing`, which should statically dispatch to the appropriate `something` method.