Calculating a standardization function from scratch

ssfrr · February 3, 2020, 3:25pm

From the docs of std:

The algorithm returns an estimator of the generative distribution’s standard deviation under the assumption that each entry of itr is an IID drawn from that generative distribution. For arrays, this computation is equivalent to calculating sqrt(sum((itr .- mean(itr)).^2) / (length(itr) - 1)). If corrected is true, then the sum is scaled with n-1, whereas the sum is scaled with n if corrected is false with n the number of elements in itr.

std by default does a “corrected” standard deviation. This is what you want if the mean of your data is estimated by taking the average of the data. If you know the mean a-priori and subtract it off manually or provide it with the mean argument, then you should give corrected=false to std.

So I think your implementation is doing the right thing and sklearn is not.

Topic		Replies	Views
Package Distributions gives different resullt for standard deviation General Usage	3	662	June 7, 2017
Standardize dataset with StatsBase Machine Learning	1	1010	April 4, 2020
How to approach this piece of code? New to Julia question	3	635	March 8, 2022
How to standardize arrays in Julia? Statistics question	7	3337	March 8, 2022
The Scaling and centering Matrices New to Julia package , function	6	3111	November 16, 2020

Calculating a standardization function from scratch

Related topics