Greetings,
I have been trying to do some basic normalization with ZScore transform like in this StatsBase package, and found that the package wouldn’t handle NaN values properly.
For example
Random.seed!(1234)
input = rand(4,5)
input[2,2] = NaN
input[3,4] = NaN
dt = StatsBase.fit(ZScoreTransform, input; dims=1)
input_normalized = StatsBase.transform(dt, input)
the input is
0.579862 0.520355 0.789764 0.711389 0.131026
0.411294 NaN 0.696041 0.103929 0.946453
0.972136 0.839622 0.566704 NaN 0.574323
0.0149088 0.967143 0.536369 0.870539 0.67765
the output is
0.214999 NaN 1.21235 NaN -1.33013
-0.209818 NaN 0.415227 NaN 1.073
1.20359 NaN -0.684786 NaN -0.0236933
-1.20877 NaN -0.942793 NaN 0.280818
ideally the NaN values in the input should be ignored, and the output should still have valid values on the [3,2] index. Of course it’s possible to hand written a normalization function, but a package should give an option to do so for applying the method in a generalized manner.
Thank you
Jack