What is T in : function pca(data::Array{T,2}) where T?

Is function:

function pca(data::Array{T,2}) where T
    X = data .- mean(data, dims=2)
    Y = X' ./ sqrt(T(size(X,2)-1))
    U,S,PC = svd(Y)
    S = diagm(0=>S)
    V = S .* S
    
    # find the least variance vector
    indexList = sortperm(diag(V); rev=true)

    PCs = map(x->PC[:,x], indexList)
    return PCs, diag(V)[indexList]
end

I have data.
A need to do all step by step , without function.
What is ‘T’ in line Y = X’ ./ sqrt(T(size(X,2)-1)) ?

Paul

T here is the type of data in the array. An Array is formally an Array{T,N} where N where T. The T is the type of data, and the N is the number of dimensions.

Ok: step by step:
My data

julia> data=readdlm("v4dane.txt")
2888×4 Array{Float64,2}:
   0.0   0.0   0.0    0.0
   0.0   0.0   0.0   75.0
   0.0   0.0   0.0  100.0
   0.0   0.0   0.0    0.0
....

julia> X = data .- mean(data, dims=2)
2888×4 Array{Float64,2}:
   0.0     0.0     0.0     0.0
 -18.75  -18.75  -18.75   56.25
 -25.0   -25.0   -25.0    75.0
   0.0     0.0     0.0     0.0
   0.0     0.0     0.0     0.0

julia>  Y = X' ./ sqrt(T(size(X,2)-1))
ERROR: UndefVarError: T not defined




To expand on Oscar’s answer, the reason for that in this case is to use the same type as your input, so if you give it a matrix of Float64s, that will covert what’s returned by size to a Float64.

In your example, add T = Float64.

Separate point - you’re quoting code as a text quote instead of as code. Compare:

function foo()
println(“This doesn’t look like code!”)
end

to

function bar()
    println("Oh, much better!")
end

Which look like this when writing:

> function foo()
>     println("This doesn't look like code!")
> end

to

```
function bar()
    println("Oh, much better!")
end
```

It should…

julia> m = rand(5,5);

julia> T = eltype(m)
Float64

julia> sqrt(T(size(m, 2) -1))
2.0
1 Like

But wouldn’t it work just as well without the type conversion, e.g. X ./ sqrt(size(X,2)-1)?

I would imagine that size returns integers, and sqrt returns at least floats, such that division X by sqrt would have type T (or higher) anyway. Using T(size(X)) seems out of place to me, since the type may be promoted by sqrt anyway. Or if not out of place, then at least a bit obscure.

I wonder:

  1. Is it necessary to cast to a particular type at all? Could one leave out the type, e.g. X'./sqrt(size(X,2)-1)) with no issues?
  2. If one is to cast T, would it make more sense after the root, e.g. T(sqrt(size(X,2)-1)?
  3. Are there more direct or idiomatic (or transparent) ways to ensure type, e.g. T.(X' ./ sqrt(size(X,2)-1))?

I think it would, and you are right, the explicit conversion may not be necessary. The code should be type stable and work fine without that T.

1 Like

Yeah, I was thinking that myself. Similar formulations are often used to generate Arrays of the correct type (eg zeros(T, n)), but in this case sqrt seems to always return a Float, regardless of T

Technically not true. Given complex inputs it will return complex outputs. Not sure about quaternians, my guess is method error.

1 Like