Hi,
I am trying to apply this standardization scheme where you standardize the test set on the base of the train set.
I am trying to do this with the ZScoreTransform of StatsBase, but I encounter this trouble.
#
using Distributions
using StatsBase
#(features, datapoints)
train = rand(Normal(1,10),(100,1000))
test = rand(Normal(1,5),(100,100))
## Verify the train dataset is normalized
println("before standardization")
mean(train,dims=2)
std(train,dims=2)
mean(test,dims=2)
std(test,dims=2)
## Train the ZScoreTransform
train_std = StatsBase.fit(ZScoreTransform, train , dims=2)
StatsBase.transform!(train_std,train)
## And each feature get standardized
mean(train,dims=2)
std(train,dims=2)
# Then I want to standardize the test set.
StatsBase.transform!(train_std,test)
# But I get Error Dimension mismatch!!!! Despite the mean and scale are with the correct size!!
size(train_std.mean)
size(train_std.scale)
Why? And how do I solve it?