Can anyone tell me what is this distribution?

abulak · January 21, 2021, 2:14pm

In my hobby project I arrived at the following problem: I’m given a family of random variables: Wₖ for which I need to estimate cdf: W_histograms
(the above is histogram approximation of pdf from 10^7 samples)

From theoretical considerations I know that

cdf(W₃)(w) = \frac{6}{\pi} \left(\sin^{-1}{\sqrt{w}} - \sin^{-1}{\sqrt{\frac{3}{4}}} \right)

It is clear though that for k=4, 5, 6... the pdfs are piecewise functions with k=4 consisting of 4 functions: linear in (0.63, 0.73), then exponential, another exponential(?), and linear close to w=1.
Similarly for k=5 there are four pieces, but as k grows the density becomes smoother (the pieces are visible up to k=7, or 8, especially if you plot histograms of sqrt(W).

Any hints from anybody who has a grasp of statistics would be much appreciated (I’m not doing statistics at all

Where does it come from?

W is the Shapiro-Wilk W-statistic, the one from SW-normality test, which I’m trying to compute/estimate more… rigorously.
see https://github.com/JuliaStats/HypothesisTests.jl/pull/124

What have I tried

For k >= 9 I found that there is a normalizing power transform:

W \mapsto (1-W_k)^t \sim \mathcal{N}(\mu_k, \sigma_k)

This seems ok, but as k grows mse of this approximation (as computed by fitting 0.005:0.005:0.995 quantiles) stabilizes at around 0.004 with noticeably thicker tails and deficit around the mean.

starting from k >= 28 a log transform

W \mapsto log(1 - W_k) \sim \mathcal{N}(\mu_k, \sigma_k)

does a better job (it is visibly skewed for k<=20).

Topic		Replies	Views
Using a (normalized) Histogram as a Distribution General Usage	29	3319	September 3, 2019
Kernel Density Estimate for cdf General Usage question , statistics	11	1647	September 26, 2022
How to approximate a distribution function from an arbitrary list? General Usage question , package	5	429	November 18, 2022
Get smooth CDF from OnlineStats's Quantile Statistics	6	202	May 6, 2024
How to write a custom distribution using pdf from a Kernel Density Estimator? Performance statistics , turing , distributions	4	538	September 14, 2021

Can anyone tell me what is this distribution?

Where does it come from?

What have I tried

Related topics