Turing: Non informative improper prior

Hi

I am looking for an example of how to assign non-informative improper prior using Turing.jl?

So far all example that I have seen uses proper prior.

Thanks and regards,
Sourish

I don’t have much experience with improper priors, but I would assume you can define a new distribution type for your prior. Here is an example from the documentation.

There is no such thing as a non-informative improper prior. You can talk about using an improper prior, but it is equivalent to saying that the parameter is either infinite or negative infinite… Think of it as a limit of a uniform(-x,x) what is the probability that a randomly chosen value will fall within the range of the 64 bit floats as x goes to infinity? The limit is zero, all the probability mass will be outside that range. This is far from uninformative.

My advice is always use proper priors. They have clear interpretable meaning and can be made as wide as you like.

2 Likes

“There is no such thing as a non-informative improper prior. You can talk about using an improper prior, but it is equivalent to saying that the parameter is either infinite or negative infinite… My advice is always use proper priors. They have clear interpretable meaning and can be made as wide as you like.”

I found your advice very amusing. There is a great deal of theory available on “non-informative and improper prior” in Bayesian statistics. You may consider looking at Page 62 of the book by Gelman, Carlin, Stern and Rubin, titled “Bayesian Data Analysis” (2003), Publisher - Chapman & Hall, Second edition. It discusses how improper prior can lead to proper posterior, and one can do valid Bayesian statistical inference.

For the model thyat I am trying, I already have my own (about 18 years old) R code implemented with Gibbs sampler, which I wrote as a graduate student. I want to translate it from R to Julia for the model that I am trying. But I want to try it with Turing.jl; hence I asked this question.

If Turing.jl can’t handle improper prior fine - I will write my own Gibbs sampler. But my question is can it handle or not?

Of course there’s such a thing as a model with an “improper prior” it’s just that it is NOT noninformative. It’s a misnomer to call it noninformative, and Gelman regrets including so many of those examples in his book.

Andrew Gelman on improper priors: Hidden dangers of noninformative priors | Statistical Modeling, Causal Inference, and Social Science

and on the alternative term “non-generative models”: Don’t say “improper prior.” Say “non-generative model.” | Statistical Modeling, Causal Inference, and Social Science

And on further reasons why not to use flat priors

https://statmodeling.stat.columbia.edu/2021/01/31/bayesian-inference-completely-solves-the-multiple-comparisons-problem-2/

But more to the point, as a practical thing, you want to get moving forward, you want to express the idea that your parameter could be over a super wide range… just do

a ~ Normal(0.0,floatmax(Float64)/10)

This will have negligible difference in outcome from the alternative:

a ~ Uniform(-floatmax(Float64),floatmax(Float64))

which is what people are actually computing with when they say they are using an “improper prior” (unless you’re computing with bigfloats) Also note that with this prior you are saying there is 90% chance that the magnitude of the value is at least something like 10^307

Thanks … I will try it … but since the I already have an algorithm which works well … perhaps I will just write my own Gibbs in Julia - and check how much time performance gain I can have

Perhaps, but Turing is generally quite good, and can use samplers like NUTS which will usually dramatically outperform Gibbs. So I would try one of those examples I suggested and see how you do.

That was precisely the reason I was hoping to use Turing - because we have NUTS. But looks like Turing always require proper prior.

I will try your suggestions and compare.

The uniform example is really the closest thing any floating point computer can get to an “improper uniform prior”. I mean there is no way to represent numbers outside the range of the floats you are using, so whatever you imagine you are theoretically doing, in practice you are truncated to the range of the floats.

Just because I have an “improper prior” does not mean I am simulating from “improper uniform prior” - that is not even my concern to represent or consider the numbers outside the range of floats. My posterior model is still proper - all I have to figure out is to simulate samples from the proper posterior distributions. I am trying to figure out if I can implement NUTS in my case - hence I am trying to figure out how to do it in Turing. If Turing can’t handle “improper prior” - I will figure out myself if a specific NUTS can be developed. Thanks for your comment

then a ~ uniform(-floatmax(Float64),floatmax(Float64)) is exactly what you want. In practice it adds a constant to the log probability, which as you probably know is irrelevant to the simulation.

Hi

Thanks. I found it useful. I will try.

Thanks
Sourish

Thanks

FWIW, Turing provides Flat and FlatPos improper uniform distributions. Using ~ uniform(-floatmax(Float64),floatmax(Float64)) seems to cause initialization errors for me, while ~ Turing.Flat() does not.

1 Like