# Variance and std of vectors

In Julia Statistics module `std()` and `var()` work for arrays of numbers, including complex numbers - expectedly, they do not support arrays of vectors. However, it is often convenient to quantify the scatter of multiple vectors in terms of var or std, so I defined a function in my personal `Utils` package that does precisely this:

``````vec_std(A; center=mean(A)) = sqrt(sum(norm.(A .- Ref(center)).^2) / (length(A) - 1))

# tests:
a = rand(10)
@test vec_std(a) ≈ std(a)
@test vec_std([[x] for x in a]) ≈ std(a)
a = rand(10) + im * rand(10)
@test vec_std(SVector.(real(a), imag(a))) ≈ std(a)
``````

As you see from the tests, this definition agrees with `Statistics.std` for real numbers (interpreted as 1d vectors) and for complex numbers (2d vectors).

Such a function seems general enough to be helpful in many contexts when working with vectors, so I thought it should already defined in some statistical package - but couldn’t find that. May be it just tends to be called differently, not “vector std”? Otherwise, do you think it would be a useful addition to some existing package - which one, if any?

What you are doing is no different than creating a single matrix out of the array of vectors. So if you do `hcat(real(a),imag(a))` instead of `SVector.(real(a), imag(a))` then you can just use the existing `std` function and get the same result.

For me `std(hcat(real(a),imag(a)))` gives the same result as `std([real(a)..., imag(a)...])` - that is, deviation of all real and imaginary parts concatenated. This is clearly different from `std(a)` and `vec_std(SVector.(real(a), imag(a)))`.

Ah ok misunderstood what you were doing. Ran your code and see that you are doing something different.

It is common to stack them up and use the `dims` argument.

Since variance involves squaring, and multiplication of two vectors is not defined:

``````julia> [1,2]*[1,2]
ERROR: MethodError: no method matching *(::Array{Int64,1}, ::Array{Int64,1})
...
``````

this is more ambiguous than it might seem to you (since you know what you want). See https://github.com/JuliaLang/Statistics.jl/issues/29, and note that https://github.com/JuliaMath/TensorCore.jl has implemented those three multiplication operators.

2 Likes

This results in another operation, not the one I show in the first post. What I mean by standard deviation here is a measure of how “scattered” the vectors are. It always outputs a single number that is basically the RMS of distances from the mean.

Actually, Julia `var` function does not involve squaring of the unrelying elements. It explicitly uses `abs2`, so that the result of `std(Vector{Complex})` is correct. I think `norm(x)^2` is a relatively obvious generalization of `abs2` to vectors.

Julia `var` function does not involve squaring of the unrelying elements. It explicitly uses `abs2`

Yes, but

``````help?> abs2
search: abs2 abs abspath AbstractSet abstract type AbstractChar AbstractDict

abs2(x)

Squared absolute value of x.
``````

itself says that it involves squaring.

a relatively obvious generalization of `abs2` to vectors.

“a” but not “the.” Some people might expect it to produce an elementwise `std` rather than a scalar:

``````julia> a = [SVector(1, 2), SVector(-3,4)]
2-element Array{SArray{Tuple{2},Int64,1,2},1}:
[1, 2]
[-3, 4]

julia> vec_std(a)
3.1622776601683795
``````

vs

``````julia> [std([1,-3]), std([2, 4])]
2-element Array{Float64,1}:
2.8284271247461903
1.4142135623730951
``````

That’s why I’d favor requiring users to be explicit about which sense of multiplication they want. Then `cov` could just go away because it’s the variance using the tensor product rather than hadamard or dot product.

2 Likes

My point is that the squaring of absolute value is not really constrained by the fact that

As for `std(Array{StaticVector})` - it is currently not very reliable anyway:

``````> a = reshape([SVector(1, 2), SVector(0, 1), SVector(1, 2), SVector(0, 1)], (2, 2))
2×2 Array{SArray{Tuple{2},Int64,1,2},2}:
[1, 2]  [1, 2]
[0, 1]  [0, 1]

> std(a[:])
2-element SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
0.5773502691896257
0.5773502691896257

> std(a, dims=1)
ERROR: MethodError: no method matching abs2(::SArray{Tuple{2},Int64,1,2})
``````

So I’m not sure this elementwise behaviour was intentional.

So I’m not sure this elementwise behaviour was intentional.

I agree, and you’ll see the same point in that Statistics.jl issue I linked.

Indeed, I see - first time I probably just skipped the code in the first message there and jumped to reading the issue discussion Anyway, even if/when a generic function like the `var()` you propose is defined, common specific versions will still make sense. E.g., it’s less error-prone to use `std_vec(A)` instead of `std(A, op=dot)`, and `std_elemwise(A)` instead of `std(A, op=hadamard)`.