Premise:
I am running experiments that involve with Monte Carlo sampling on clusters, and I am collecting the mean and variance using OnlineStats.
It takes a long time and I get time out errors on our clusters, so I want to save data and restart in another job.
Since the number of sampling is huge, I want to store only the mean, variance and the sample size, (not the whole data) to restart.
I know value(mystat) gives the mean and variance, and nobs(mystat)gives the sample size, which I can save to a .txt file and I can read it in another run.
But given the mean, variance, and the sample size, I don’t know how to create Series “mystat” with the same information, so that I can merge! in another run of my experiment.
So with mystat = Series(Mean(),Variance()); fit!(mystat, rand(20))
I get Series ├─ Mean: n=10 | value=0.446827 └─ Variance: n=10 | value=0.0938919
mypreviousstat = Series(mymean,myvar)
But I get the correct mean but the variance is not correct. Series ├─ Mean: n=10 | value=0.446827 └─ Variance: n=10 | value=0.104324
I may be misunderstanding some definitions of sample variance etc. but \sigma2 looks like the variance and I assume the definition of sample variance to be the same in the function…what did I miss…?
Worked. So somehow the parameter \sigma2 in the struct is the biased estimator of the variance (the one with 1/(sample size)) and value(mystat)[2] gives the (unbiased) sample variance (the one with 1/(sample size -1))…?