Variance and std of vectors

In Julia Statistics module std() and var() work for arrays of numbers, including complex numbers - expectedly, they do not support arrays of vectors. However, it is often convenient to quantify the scatter of multiple vectors in terms of var or std, so I defined a function in my personal Utils package that does precisely this:

vec_std(A; center=mean(A)) = sqrt(sum(norm.(A .- Ref(center)).^2) / (length(A) - 1))

# tests:
a = rand(10)
@test vec_std(a) ≈ std(a)
@test vec_std([[x] for x in a]) ≈ std(a)
a = rand(10) + im * rand(10)
@test vec_std(SVector.(real(a), imag(a))) ≈ std(a)

As you see from the tests, this definition agrees with Statistics.std for real numbers (interpreted as 1d vectors) and for complex numbers (2d vectors).

Such a function seems general enough to be helpful in many contexts when working with vectors, so I thought it should already defined in some statistical package - but couldn’t find that. May be it just tends to be called differently, not “vector std”? Otherwise, do you think it would be a useful addition to some existing package - which one, if any?

What you are doing is no different than creating a single matrix out of the array of vectors. So if you do hcat(real(a),imag(a)) instead of SVector.(real(a), imag(a)) then you can just use the existing std function and get the same result.

For me std(hcat(real(a),imag(a))) gives the same result as std([real(a)..., imag(a)...]) - that is, deviation of all real and imaginary parts concatenated. This is clearly different from std(a) and vec_std(SVector.(real(a), imag(a))).

Ah ok misunderstood what you were doing. Ran your code and see that you are doing something different.

It is common to stack them up and use the dims argument.

Since variance involves squaring, and multiplication of two vectors is not defined:

julia> [1,2]*[1,2]
ERROR: MethodError: no method matching *(::Array{Int64,1}, ::Array{Int64,1})
...

this is more ambiguous than it might seem to you (since you know what you want). See https://github.com/JuliaLang/Statistics.jl/issues/29, and note that https://github.com/JuliaMath/TensorCore.jl has implemented those three multiplication operators.

2 Likes

This results in another operation, not the one I show in the first post. What I mean by standard deviation here is a measure of how “scattered” the vectors are. It always outputs a single number that is basically the RMS of distances from the mean.

Actually, Julia var function does not involve squaring of the unrelying elements. It explicitly uses abs2, so that the result of std(Vector{Complex}) is correct. I think norm(x)^2 is a relatively obvious generalization of abs2 to vectors.

Julia var function does not involve squaring of the unrelying elements. It explicitly uses abs2

Yes, but

help?> abs2
search: abs2 abs abspath AbstractSet abstract type AbstractChar AbstractDict

  abs2(x)

  Squared absolute value of x.

itself says that it involves squaring.

a relatively obvious generalization of abs2 to vectors.

“a” but not “the.” Some people might expect it to produce an elementwise std rather than a scalar:

julia> a = [SVector(1, 2), SVector(-3,4)]
2-element Array{SArray{Tuple{2},Int64,1,2},1}:
 [1, 2]
 [-3, 4]

julia> vec_std(a)
3.1622776601683795

vs

julia> [std([1,-3]), std([2, 4])]
2-element Array{Float64,1}:
 2.8284271247461903
 1.4142135623730951

That’s why I’d favor requiring users to be explicit about which sense of multiplication they want. Then cov could just go away because it’s the variance using the tensor product rather than hadamard or dot product.

2 Likes

My point is that the squaring of absolute value is not really constrained by the fact that

As for std(Array{StaticVector}) - it is currently not very reliable anyway:

> a = reshape([SVector(1, 2), SVector(0, 1), SVector(1, 2), SVector(0, 1)], (2, 2))
2×2 Array{SArray{Tuple{2},Int64,1,2},2}:
 [1, 2]  [1, 2]
 [0, 1]  [0, 1]

> std(a[:])
2-element SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 0.5773502691896257
 0.5773502691896257

> std(a, dims=1)
ERROR: MethodError: no method matching abs2(::SArray{Tuple{2},Int64,1,2})

So I’m not sure this elementwise behaviour was intentional.

So I’m not sure this elementwise behaviour was intentional.

I agree, and you’ll see the same point in that Statistics.jl issue I linked.

Indeed, I see - first time I probably just skipped the code in the first message there and jumped to reading the issue discussion :slight_smile:

Anyway, even if/when a generic function like the var() you propose is defined, common specific versions will still make sense. E.g., it’s less error-prone to use std_vec(A) instead of std(A, op=dot), and std_elemwise(A) instead of std(A, op=hadamard).