Problems with tuples in Statistics function

using Statistics
using Distributions
function TwoSampleT2Test(X,Y)
    nx, p = size(X)
    ny, _ = size(Y)
    δ = mean(X, dims=1) - mean(Y, dims=1)
    Sx = cov(X)
    Sy = cov(Y)
    S_pooled = ((nx-1)*Sx + (ny-1)*Sy)/(nx+ny-2)
    t_squared = (nx*ny)/(nx+ny) * δ * inv(S_pooled) * transpose(δ)
    statistic = t_squared[1,1] * (nx+ny-p-1)/(p*(nx+ny-2))
    F = FDist(p, nx+ny-p-1)
	p_value= 1 - pvalue(Kolmogorov(), sqrt(x.n)*x.δ; tail=:right)
    println("Test statistic: $(statistic)\nDegrees of freedom: $(p) and $(nx+ny-p-1)\np-value: $(p_value)")
    return([statistic, p_value])

I get a bounds error, something to do with selecting an invalid value of a tuple. Is there a problem with the function, or is it the array/tuple that I was using as a an example?

I also tried to use Rdatasets, but

using Rdatasets
iris = dataset("datasets", "iris")

versicolor = convert(Matrix, iris[iris.Species .== "versicolor", 1:2])

virginica = convert(Matrix, iris[iris.Species .== "virginica", 1:2])

gave me a multiple definition error in Pluto.

please post your code like they are code


sorry I used quotation marks by mistake.

Your error has nothing to do with statistics. The MWE would be:

julia> x=[1,2,3,4,5,6,7,8]
8-element Vector{Int64}:

julia> nx, p = size(x)
ERROR: BoundsError: attempt to access Tuple{Int64} at index [2]

You are calling size on a vector, which gives you back a 1-element tuple (as a vector is one-dimensional, it has only length). You then try to destructure that by assigning it to two variables (nx, p) but that doesn’t work as there’s only one element in the tuple.

Ok, do I need to assign the tuple as an array? How do I fix this?

what do you expect nx and p to be respectively?

I don’t know, I was trying to get someone elses function to work. I instead found tho Hotelling Test in HypothesisTests.jl , but I’m not sure how to enter it.

using HypothesisTests

x = randn(100_000); y = randn(100_000);
BartlettTest(x,y) #not the Hotelling test, but I need to understand how to use matrix variables first

tells me that there is no method matching. Instead i need to use an abstract matrix.
I know that a matirx is an array of arrays, and that an array is a matrix with 1 row, but I’m not sure if this is a formatting problem, ie. do I need to change type, or arrange both x and y into a single maxrix ?

Or, if I need more tests.

no, a Matrix is a Array with dimension equals 2.

julia> Matrix
Matrix{T} where T (alias for Array{T, 2} where T)

also no, a Vector is probably what you’re thinking here and that is a column vector, not a row vector:

julia> a = rand(3)
3-element Vector{Float64}:

julia> a' * a

I took linear algebra, so I should know this, but what does a_prime and a mean?

a' is short for adjoint(a).
You can find this information in the REPL help with ?something, for example ?'.

So is there a way to put two arrays into a two-dimensional matrix?

If you mean to put the first as first colum and the second as the secon colum it is hcat(vector1,vector2)

Have a look on my tutorial: 2 - Data types - Julia language: a concise tutorial

OK, I got this, now I get that a zero vector is a point, but is mu[0] a common mean, between two ranges?

Not sure here what you mean here . For “zero vector” do you mean a vecor of zeros or a empty vector . but in both point it is hard to interpret as a “point”.

mu[0] looks odds in Julia, as vector indices start from 1.

1 Like

Multivariate tests · HypothesisTests.jl

OneSampleHotellingT2Test(X::AbstractMatrix, Y::AbstractMatrix, μ₀=<zero vector>)

and here is my understanding of a zero vector

The line from the docs below tells you what you need to know. Where it says μ₀=<zero vector>, that means that the default value of μ₀ if not specified, is a vector containing all zeros (Functions · The Julia Language). Since the docs don’t tell you what the function returns, click on the source link and you can clearly see that it returns an instance of the OneSampleHotellingT2Test struct.

test of the hypothesis that the vector of mean column differences between X and Y is equal to μ₀ .

Ok, I think I get that, so what do I enter there? An array of N number of zeros?

You pass a vector that you want to test the mean column differences against. That’s up to your problem. If you want to test whether the column means are equal, ie., their differences are equal, don’t pass anything so that it uses the default value.

OK, I got that, so what would I enter, so that it knows I want default?

The default value for a keyword argument is the value that the argument will take is it is not supplied when the function is called. This is the same as what I showed you with tail in the p value calculations before.