LogNormal calculation of standard deviation gets two different answers

Soldalma · July 2, 2023, 1:59pm

I noticed that calculating the standard deviation of a LogNormal distribution directly or from a very large sample yields quite different results. This is not true for the mean.

mulog = -12.0
sdlog = 3.0

dist = Distributions.LogNormal(mulog, sdlog)
y = rand(dist, 10000000);

mean(dist) # 0.0005530843701478336
mean(y) # 0.0005622805244076315

std(dist) # 0.04978399616689943
std(y) # 0.028833362676711747

I do not think this is due to rounding errors, since the sample is huge. Am I wrong?

Freya_the_Goddess · July 2, 2023, 2:25pm

Which packages are used ? I try to replicate the code and get error.

Soldalma · July 2, 2023, 2:36pm

Sorry for the omission. The package Distributions is needed in the following line:

dist = Distributions.LogNormal(mulog, sdlog)

Dan · July 2, 2023, 3:27pm

Maybe it’s the revenge of the fat-tailed distribution:

julia> using UnicodePlots

julia> histogram(y)
                ┌                                        ┐ 
   [ 0.0,  2.0) ┤█████████████████████████████  9 999 878  
   [ 2.0,  4.0) ┤▏ 93                                      
   [ 4.0,  6.0) ┤▏ 11                                      
   [ 6.0,  8.0) ┤▏ 6                                       
   [ 8.0, 10.0) ┤▏ 4                                       
   [10.0, 12.0) ┤▏ 3                                       
   [12.0, 14.0) ┤▏ 2                                       
   [14.0, 16.0) ┤  0                                       
   [16.0, 18.0) ┤  0                                       
   [18.0, 20.0) ┤▏ 1                                       
   [20.0, 22.0) ┤  0                                       
   [22.0, 24.0) ┤  0                                       
   [24.0, 26.0) ┤  0                                       
   [26.0, 28.0) ┤  0                                       
   [28.0, 30.0) ┤▏ 1                                       
   [30.0, 32.0) ┤  0                                       
   [32.0, 34.0) ┤  0                                       
   [34.0, 36.0) ┤  0                                       
   [36.0, 38.0) ┤  0                                       
   [38.0, 40.0) ┤▏ 1                                       
                └                                        ┘ 
                                 Frequency

See the above very skewed histogram.
Compare with:

julia> mulog=1.0
1.0

julia> sdlog=1.0
1.0

julia> dist = Distributions.LogNormal(mulog, sdlog)
LogNormal{Float64}(μ=1.0, σ=1.0)

julia> y = rand(dist, 10000000);

julia> std(y)
5.872315430987349

julia> std(dist)
5.874743663340262

which has a less skewed histogram.

Christopher_Fisher · July 2, 2023, 4:59pm

Based on my inspection, the implementation the Log Normal distribution appears to be correct. I believe Dan is correct about the heavy tails introducing uncertainty into the estimates of the mean and standard deviation. One example of a heavy-tailed distribution without a defined variance is the Cauchy distribution.

Dan · July 2, 2023, 6:34pm

The sample variance (and standard deviation) is a statistic and as such it has a convergence rate. In tame distributions, this rate depends on the 4th moment.
Doing the relevant searches for the exact expressions, I found the following:

# variance of sample variance
# m4 = 4th moment
# s4 = standard deviation ^ 4
# n = number of samples
vars2(m4,s4,n) = m4/n - s4*(n-3)/n/(n-1)

# LogNormal 4th moment given mu and sd
lognormal_m4(mulog, sdlog) = exp(4*mulog + (1/2)*16*sdlog^2)
# LogNormal standard deviation ^ 4 given mu and sd
lognormal_s4(mulog, sdlog) = var(Distributions.LogNormal(mulog, sdlog))^2

# Calculation relevant to post
vars2(lognormal_m4(-12.0, 3.0), lognormal_s4(-12.0, 3.0),1000000) #  = 26489.122129843465

The result 26489.1(…) is clearly too large to get an accurate sample standard deviation.
For mu=1.0 sd=1.0, it gives 0.16 with same n, enough to get accuracy for the population sd of 5.87(…).

This was a bit of a hasty calculation/google, but the results make sense.

Topic		Replies	Views
LogNormal-Distribution - how to set mu and sigma Statistics question , statistics	17	13961	December 30, 2021
Package Distributions gives different resullt for standard deviation General Usage	3	659	June 7, 2017
Need advice when data almost (but not quite) fit log normal dist Statistics distributions	7	924	January 13, 2021
The reason why the parameters for Distributions.Normal is mean and standard deviation Internals & Design question , package	6	841	March 20, 2019
I want to implement a normal log in julia and i'm new in and don't know how to do it New to Julia question	12	2165	April 28, 2020

LogNormal calculation of standard deviation gets two different answers

Related topics