Is there a package dealing with these kinds of computations, or should I make my own functions for my purposes?
I use Measurements.jl
.
Thanks for the reply.
I was looking at the package and wondering if it would fit what I want. At the moment it doesnāt look like it, but I could be wrong. Can you elaborate, and possibly give examples?
At the moment Iām at least looking, for example, for something that would simply let me plug values into functions to get confidence intervals for (arithmetic) means, proportions, standard deviations, differences and sums etc. in the context of statistical estimation theory.
It would be nice if itād let me plug the values you would normally encounter if you calculate by hand, such as the arithmetic mean, n, N, p. etc.
Can you expand a bit on the application you have in mind? This way itāll be easier to help you.
For example, if you have quantities with uncertainties and want to perform calculations with them youāre looking for error propagation in which case the mentioned Measurements.jl or perhaps MonteCarloMeasurements.jl would be helpful. On the other hand, if you have a time series of data - perhaps even correlated - and want to estimate the standard error you might want to take a look at BinningAnalysis.jl or similar.
We posted simultaneously. Please see my previous post.
Are your values statistically independent or correlated?
Assuming that I understand your use case correctly, you could simply use the functions provided by the Statistics
standard library and StatsBase.jl
. In the latter case, you could look at BinningAnalysis.jl
linked above, which utilizes logarithmic binning to estimate the standard error of the mean of your values.
Thanks for the input.
My use case is simple. It is like I explained earlier. Imagine you have formulas for estimating the population parameters with confidence intervals, based on sample statistics (such as sample means, proportion) and are calculating exactly according to these formulae. Is there then a package that has these formulae, or should I rather make them myself?
For example when you want to estimate a rate from a binomial sample, then the posterior on the rate is a Beta distribution, and you can compute the CI on that, but you need to know that itās a Beta and how the data goes into it with the prior.
I donāt know of a package that does that easily, usually I do it myself using Distributions.jl and wikipedia, which is indeed not ideal.
In case you consider population parameters like mean
or variance
, there
is a number of confint
distributed over Statistics
, StatsBase
and in particular HypothesisTests
:
# 12 methods for generic function "confint":
[1] confint(x::BinomialTest; level, tail, method) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/binomial.jl:104
[2] confint(x::SignTest; level, tail) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/binomial.jl:218
[3] confint(x::FisherExactTest; level, tail, method) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/fisher.jl:181
[4] confint(x::PowerDivergenceTest; level, tail, method, correct, bootstrap_iters, GC) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/power_divergence.jl:73
[5] confint(obj::StatisticalModel) in StatsBase at /Users/smoritz/.julia/packages/StatsBase/DyWPR/src/statmodels.jl:32
[6] confint(x::HypothesisTests.TTest; level, tail) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/t.jl:37
[7] confint(x::HypothesisTests.ZTest; level, tail) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/z.jl:37
[8] confint(x::ExactSignedRankTest; level, tail) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/wilcoxon.jl:164
[9] confint(x::ApproximateSignedRankTest; level, tail) in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/wilcoxon.jl:239
[10] confint(test::CorrelationTest{T}) where T in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/correlation.jl:61
[11] confint(test::CorrelationTest{T}, level::Float64) where T in HypothesisTests at /Users/smoritz/.julia/packages/HypothesisTests/wSEbN/src/correlation.jl:61
You can probably make https://github.com/MikeInnes/Poirot.jl do what you want?
I may be misunderstanding, but is sounds like you are under the impression that there is a single answer to the question of building confidence intervals. Unfortunately in Statistics this is not the case. The appropriate method in any application hinges critically on the true data generating process (e.g. is your data IID, weakly dependent, non-stationary, e.t.c.) in combination with the statistic of interest. We need to know these things to direct you to an appropriate package. If you are asking whether there is some package that universally builds appropriate confidence intervals for all data-types, well, Iām not aware of any such package in any programming language.
You may not be aware of this, but there is no single set of formulas that fits each application, except for some special cases under very, very specific assumptions.
There exist closed-form formulas for some models and methodologies, for some others you can obtain them numerically rather easily, and of course for other models you have to use Monte Carlo methods.
It would be easier to help if you specified
- the statistical model you are interested in,
- the methodology you are using (Bayesian? frequentist?)
- the kind of confidence interval you want (frequentist CI, Bayesian HPD, etc)
Thinking about it whatās missing it a way to get posterior on parameters when fitting a distribution, currently Distributions.jlās fit
only returns the MLE. That way if you want to estimate a frequency you could do something like that :
julia>dfit = my_fit(Binomial,100,[10])
Fitted{Binomial}(...)
julia>mle(dfit)
Binomial{Float64}(n=100, p=0.1)
julia>p = posterior(dfit, :p)
Beta{Float64}(Ī±=10.0, Ī²=90.0)
julia>confidence_interval(p, 0.9)
(0.05583217884206651, 0.15327514365732653)
In some relevant cases thereās closed-form formulas for the posteriors, but posterior
could also return a sampled or approximated distribution when itās not the case.
And here, there is GitHub - JuliaStats/ConjugatePriors.jl: A Julia package to support conjugate prior distributions.
For proportions you can see https://github.com/PharmCat/ClinicalTrialUtilities.jl or take code from there (ci.jl)
I wonder if you can suggest a book on confidence intervals of different models and methodologiesā¦
I am not aware of an introductory textbook that compares various approaches ā each one usually deals with its own. But if you are really interested, I would recommend
@article{berger1988likelihood,
title={The likelihood principle},
author={Berger, James O and Wolpert, Robert L and Bayarri, MJ and DeGroot, MH and Hill, Bruce M and Lane, David A and LeCam, Lucien},
journal={Lecture notes-Monograph series},
volume=6,
year=1988,
publisher={JSTOR}
}
which is great fun. Working through the book, you will learn a lot of useful facts about the principles of statistics, which you can weave into lunchtime conversations with colleagues up to the point that they will be inclined to dump a plate of lasagna on your head.
But the gist is really simple: frequentist (Neyman) CI usually donāt mean what people assume they mean, Bayesian HPD is a nice posterior visualization tool. I would go for posterior predictive checks instead for serious modeling, eg
@article{gelman1996posterior,
title={Posterior predictive assessment of model fitness via realized discrepancies},
author={Gelman, Andrew and Meng, Xiao-Li and Stern, Hal},
journal={Statistica sinica},
pages={733--760},
year=1996,
publisher={JSTOR}
}
Incidentally, Andrew Gelman has a lot of neat articles on p-values.
Thanks for your replies.
My use case is based on a table having z critical values and their respective probability percentages (between 0 and 1), where I use common values.
I want to make my own functions (or use existing ones) to calculate confidence intervals, proportions, means, and the like, going from sample to population. A part will be to plug in a z critical value into the function. Calculations are made (I have no time to write out these formulas here now, maybe later.) I output the calculated information nicely, including saying what the confidence percentage is, and so on.
And I want to also simply plug in the desired confidence level. So this value must be converted to the z critical value, which may then be used in computation.
Suppose you have a table such as this (I hope it renders well; I donāt use this so often):
% Conf. Lev. .9973 .99 .98 68.27
z_c 3.00 2.58 2.33 1.00
etc.
If youād input 3.00 for z_c, what is the formula to yield .9973? If you input .99, what is the formula to yield 2.58?
In this case, if I make those functions myself, and perhaps share them with others if thereās interest, I am particularly interested in these two formulas.
I think you want the quantile
function from Distributions.jl:
julia> z = Normal()
Normal{Float64}(Ī¼=0.0, Ļ=1.0)
julia> -quantile.(z, 0.5 .* (1 .- [.9973, .99, .98, .6827]))
4-element Array{Float64,1}:
2.999976992703395
2.5758293035489053
2.326347874040846
1.000021713322999
(note that to get the quantile from the 2-sided p value which you seem to be doing here, you have to adjust the p value to be half the distance from 1, hence the 0.5 .* (1 .- p)
bit)
Thanks for the book recommendation, I just started reading it (āThe likelihood principleā). I wish I had read it a few years earlier