Standardize dataset with StatsBase

Hi,
I am trying to apply this standardization scheme where you standardize the test set on the base of the train set.
I am trying to do this with the ZScoreTransform of StatsBase, but I encounter this trouble.

#

using Distributions
using StatsBase
#(features, datapoints)
train = rand(Normal(1,10),(100,1000))
test  = rand(Normal(1,5),(100,100))

## Verify the train dataset is normalized
println("before standardization")
mean(train,dims=2)
std(train,dims=2)
mean(test,dims=2)
std(test,dims=2)
## Train the ZScoreTransform
train_std = StatsBase.fit(ZScoreTransform, train , dims=2)
StatsBase.transform!(train_std,train)

## And each feature get standardized
mean(train,dims=2)
std(train,dims=2)

# Then I want to standardize the test set.
StatsBase.transform!(train_std,test)
# But I get Error Dimension mismatch!!!! Despite the mean and scale are with the correct size!!
size(train_std.mean)
size(train_std.scale)


Why? And how do I solve it?

1 Like

i run the code, there was no problem. maybe update to latest version?

2 Likes