Skewness/kurtosis


#1

Hi there. What am I missing? Skewness and kurtosis are different…

julia> using OnlineStats, StatsBase

julia> w = randn(10000);

julia> ws = Series(Mean(), Variance(), Moments());

julia> mean(w), var(w), skewness(w), kurtosis(w)
(0.004919317337882392, 0.9956975641666076, 0.005001290236278385, 0.022235410469709116)

julia> fit!(ws, w)
▦ Series{0}
│ EqualWeight | nobs=10000
├── Mean(0.00491932)
├── Variance(0.995698)
└── Moments([0.00491932, 0.995622, 0.0196614, 2.99593])

#2

My guess is that kurtosis calculates the excess kurtosis (from 3).


#3

I compared the results with R:

> x <- fread("/tmp/rand10k.txt")$V1
> skewness(x, type=1)
[1] 0.00500129
> kurtosis(x, type=1)
[1] 0.02223541

Using Wikipedia’s definition of standardized moments, I obtained the same results:

> y <- x - mean(x)
> sum(y^3)/10000 / (sum(y^2)/10000)^1.5
[1] 0.00500129
> sum(y^4)/10000 / (sum(y^2)/10000)^2 - 3
[1] 0.02223541

So indeed 0.02223541 is the excess kurtosis. OnlineStat’s kurtosis is approx. 3 apart so your observation is likely correct.

I still don’t understand what algorithm OnlineStats uses, however. Same question for skewness as there’s no excess in skewness… I may just dive into the source code when I have time…


#4

StatsBase documentation states that skewness computes the skewness à la Joanes and Gill (1998) type 1. This is similar to R’s

e1071::skewness(x, na.rm = FALSE, type = 1)

(Meyer et al. 2017) which by default uses type = 3.
Similarly, for kurtosis the difference is that e1071 uses type = 3 by default and StatsBase computes type = 1.

I am unfamiliar with OnlineStats, but you can probably read the documentation to find out.


#5

The following comment in the code gives a hint that it does not represent the central moment. It seems to be keeping tracking of moment about the origin (as opposed to moment about the mean.)

I found that OnlineStats provides skewness and kurtosis functions that take Moments as an argument, hence:

julia> ws.stats
(Mean(0.00491932), Variance(0.995698), Moments([0.00491932, 0.995622, 0.0196614, 2.99593]))

julia> skewness(ws.stats[3])
0.004999061081258352

julia> kurtosis(ws.stats[3])
0.02163099361000098

and the kurtosis matches the type-3 value from R.

> skewness(x, type=3)
[1] 0.00500054
> kurtosis(x, type=3)
[1] 0.02163099

Skewness is still a little off. Following Wikipedia’s conversion formula:

\mu _{3}=\mu '_{3}-3\mu \mu '_{2}+2\mu ^{3}

I can now match skewness exactly:

julia> function myskewness(o::Moments)
           v = value(o)
           (v[3] - 3.0 * v[1] * v[2] + 2 * v[1] ^ 3) / var(o) ^ 1.5
       end
myskewness (generic function with 1 method)

julia> myskewness(ws.stats[3])
0.005000540061498055

So it looks like it’s just an issue with with OnlineStats…