In my hobby project I arrived at the following problem: I’m given a family of random variables: `Wₖ`

for which I need to estimate cdf:

(the above is histogram approximation of pdf from 10^7 samples)

From theoretical considerations I know that

It is clear though that for `k=4, 5, 6...`

the pdfs are piecewise functions with `k=4`

consisting of `4`

functions: linear in `(0.63, 0.73)`

, then exponential, another exponential(?), and linear close to `w=1`

.

Similarly for `k=5`

there are four pieces, but as `k`

grows the density becomes smoother (the pieces are visible up to `k=7`

, or `8`

, especially if you plot histograms of `sqrt(W)`

.

Any hints from anybody who has a grasp of statistics would be much appreciated (I’m not doing statistics at all

### Where does it come from?

W is the Shapiro-Wilk W-statistic, the one from SW-normality test, which I’m trying to compute/estimate more… rigorously.

see https://github.com/JuliaStats/HypothesisTests.jl/pull/124

### What have I tried

For `k >= 9`

I found that there is a normalizing power transform:

This seems ok, but as `k`

grows mse of this approximation (as computed by fitting `0.005:0.005:0.995`

quantiles) stabilizes at around `0.004`

with noticeably thicker tails and deficit around the mean.

starting from `k >= 28`

a log transform

does a better job (it is visibly skewed for `k<=20`

).