The Scaling and centering Matrices

With what package and Julia functions the Scaling and centering Matrices is performed?

I’m not sure if there is a package, but both of these would be 1 line functions if you wanted to make them.

1 Like

A simple analogue of R’s scale(A) function (with the default arguments) would be:

using LinearAlgebra, Statistics
scale(A) = mapslices(normalize!, A .- mean(A,dims=1), dims=1)
8 Likes
using Statistics
scale(A) = (A .- mean(A,dims=1)) ./ std(A,dims=1)

Should also work :). Might need to change dims = 1 or 2 if the data is row/column major.

ChemometricsTools.jl offers this as “CenterScale()”, the way it works is kinda nice because you can do the following…

using ChemometricsTools
scaler = CenterScale(A)
Ascaled = scaler(A)
#now you can use this same mean/stddev to center & scale new data for inference
Bscaled = scaler(B)

Something like that anyways - I’d have to check the docs.

4 Likes

What happens with mapslices function? Is it the only case in which the comparison is not satisfactory!

using RCall
#
rcopy(R"R.version.string")
julia> rcopy(R"R.version.string")
"R version 3.6.1 (2019-07-05)"

RCall.reval("set.seed(12345)")
A=rcopy(RCall.reval("matrix(rnorm(20), nrow = 4)"))
@rput A
tR=rcopy(R"scale(A)")
using LinearAlgebra, Statistics

julia> scale1(X) = mapslices(normalize!, X .- mean(X,dims=1), dims=1)
scale1 (generic function with 1 method)

julia> tJ=scale1(A)
4×5 Array{Float64,2}:
 -0.606239    0.395508  -0.731593   0.824123    0.720878
 -0.203018   -0.448957   0.599096  -0.0240218  -0.692767
  0.767816    0.592667  -0.154067  -0.39105    -0.0115036
  0.0414407  -0.539219   0.286565  -0.409051   -0.0166076

julia> tR
4×5 Array{Float64,2}:
 -1.05004     0.685041  -1.26716    1.42742     1.2486
 -0.351637   -0.777616   1.03766   -0.0416069  -1.19991
  1.3299      1.02653   -0.266853  -0.677319   -0.0199249
  0.0717774  -0.933954   0.496345  -0.708498   -0.0287652

julia> floor.(tR,digits=6) != floor.(tJ,digits=6)
true

julia> scale2(X) = (X .- mean(X,dims=1)) ./ std(X,dims=1)
scale2 (generic function with 1 method)

julia> tJ=scale2(A)
4×5 Array{Float64,2}:
 -1.05004     0.685041  -1.26716    1.42742     1.2486
 -0.351637   -0.777616   1.03766   -0.0416069  -1.19991
  1.3299      1.02653   -0.266853  -0.677319   -0.0199249
  0.0717774  -0.933954   0.496345  -0.708498   -0.0287652

julia> tR
4×5 Array{Float64,2}:
 -1.05004     0.685041  -1.26716    1.42742     1.2486
 -0.351637   -0.777616   1.03766   -0.0416069  -1.19991
  1.3299      1.02653   -0.266853  -0.677319   -0.0199249
  0.0717774  -0.933954   0.496345  -0.708498   -0.0287652

julia> floor.(tR,digits=6) != floor.(tJ,digits=6)
false

julia> using LinearAlgebra,  StatsBase

julia> scale3(X)=standardize(ZScoreTransform,X,dims=1)
scale3 (generic function with 1 method)

julia> tJ=scale3(A)
4×5 Array{Float64,2}:
 -1.05004     0.685041  -1.26716    1.42742     1.2486
 -0.351637   -0.777616   1.03766   -0.0416069  -1.19991
  1.3299      1.02653   -0.266853  -0.677319   -0.0199249
  0.0717774  -0.933954   0.496345  -0.708498   -0.0287652

julia> tR
4×5 Array{Float64,2}:
 -1.05004     0.685041  -1.26716    1.42742     1.2486
 -0.351637   -0.777616   1.03766   -0.0416069  -1.19991
  1.3299      1.02653   -0.266853  -0.677319   -0.0199249
  0.0717774  -0.933954   0.496345  -0.708498   -0.0287652

julia> floor.(tR,digits=6) != floor.(tJ,digits=6)
false

julia> using ChemometricsTools

julia> scaler = CenterScale(A)
CenterScale{Array{Float64,2},Array{Float64,2}}([0.28809252478401026 -0.07494330466750099 … -0.5848475671701096 0.6389883622663647], [1.1498790318008532 0.8687946525882664 … 1.4295913312370347 0.933934864794779], true)

julia> tJ = scaler(A)
4×5 Array{Float64,2}:
 -1.05004     0.685041  -1.26716    1.42742     1.2486
 -0.351637   -0.777616   1.03766   -0.0416069  -1.19991
  1.3299      1.02653   -0.266853  -0.677319   -0.0199249
  0.0717774  -0.933954   0.496345  -0.708498   -0.0287652

julia> tR
4×5 Array{Float64,2}:
 -1.05004     0.685041  -1.26716    1.42742     1.2486
 -0.351637   -0.777616   1.03766   -0.0416069  -1.19991
  1.3299      1.02653   -0.266853  -0.677319   -0.0199249
  0.0717774  -0.933954   0.496345  -0.708498   -0.0287652

julia> floor.(tR,digits=6) != floor.(tJ,digits=6)
false

My mistake, I misunderstood the normalization — R normalizes the columns to have standard-deviation (root-mean-square, with the n-1 Bessel correction) equal to 1, whereas I was normalizing them to have root-sum-square (norm) equal to 1, so my suggestion was off by a factor of sqrt(3) in this case. A corrected version should be:

scale(A) = mapslices(normalize!, A .- mean(A,dims=1), dims=1) * sqrt(size(A,1)-1)
1 Like

Note that there is no “standard” scaling — it really depends on the purpose and the context. Eg cf

http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf

On a related note, any reasonable scaling will serve for numerical purposes.

3 Likes