Bayesian Model Selection

ANasc · February 21, 2021, 4:34pm

Hi All,

This question might reveal my limited knowledge of Bayesian estimation, but I am having difficulty figuring out how to perform basic Bayesian model selection in Turing. The documentation for MCMCChains.jl has a short section on DIC, which I’m not very familiar with. How does one calculate Bayes factors, for example? I can’t even figure out how to get model Evidence from the output of sample(). (I’m using Metropolis-Hastings algorithm). Any tips or directions of where to look would be greatly appreciated.

Thanks

rikh · February 22, 2021, 9:15am

I don’t know much about the metrics for model selection, but do know that various statisticians argue against Bayes factors (Tendeiro & Kiers, 2019; Gelman et al., 2020).

As far as I understand, McElreath (2020) and Gelman et al. (2020) advise to do model comparison based on predictive performance measures such as cross-validation and AIC. McElreath (2020) argues that these metrics are quite similar.

But, I think that it also depends on where you want to publish. My co-authors aren’t comfortable with cross-validation, so I will stick to a multiverse analysis (Steegen et al., 2016) with a multilevel model to reduce overfitting and clear graphs to show what the model does.

Christopher_Fisher · February 22, 2021, 9:51am

As far as I can tell, there are two options currently, First, there is a Julia interface to ArviZ, which is written in Python. Second, there is a package associated with the book Statistical Rethinking that implements WAIC and LOO. The downside to the second option is that it does not currently have diagnostics for the model selection criteria.

ANasc · February 23, 2021, 5:48am

Thank you both for the links, these are very useful. I will brush up on which approaches are most appropriate for my case and see if I can implement something.

I’ll also look more closely at DIC in MCMCChains. I can’t seem to get it to work. It gives me an error that I haven’t used the correct inputs for logpdf(), and I’m not sure exactly what needs to be in there (for both the distribution and x). I’m assuming it’s asking for the likelihood function (mine is BinomialLogit). Here is the function:

lpfun = function f(chain::Chains) # function to compute the logpdf values
    niter, nparams, nchains = size(chain)
    lp = zeros(niter + nchains) # resulting logpdf values
    for i = 1:nparams
        lp += map(p -> logpdf( ... , x), Array(chain[:,i,:]))
    end
    return lp
end
DIC, pD = dic(chn, lpfun)```

Christopher_Fisher · February 24, 2021, 6:42pm

I think the problem is that you are supposed to add your distribution and data to the code. For example,

lpfun = function f(chain::Chains) # function to compute the logpdf values
    niter, nparams, nchains = size(chain)
    lp = zeros(niter + nchains) # resulting logpdf values
    for i = 1:nparams
        lp += map(p -> logpdf(BinomialLogit(p...), your_data), Array(chain[:,i,:]))
    end
    return lp
end
DIC, pD = dic(chn, lpfun)

However, I think this API might be difficult for users. It might be better to do something like this:

DIC, pD = dic(chn, distribution_object, the_data)

Under my proposal, the function dic would create the lpfun function and perform the calculations as normal. This way the user does not need to modify functions.

@cpfiffer, would this be a better way to compute DIC or did I misunderstand the documentation?

cpfiffer · February 24, 2021, 6:53pm

Yeah, I think that’s probably a better way to do it. It seems like a pretty trivial thing to add that extension into MCMCChains – maybe a good first issue for somebody.

Christopher_Fisher · February 24, 2021, 7:54pm

I tried to change the API, but I encountered several issues with the lpfun function. It almost seems like it was never fully tested and implemented. I opened an issue.

ANasc · February 25, 2021, 3:37pm

@Christopher_Fisher Thanks for working on this, I really appreciate it.

Christopher_Fisher · February 25, 2021, 7:03pm

`No problem. The new plan is to add DIC to PSS tomorrow or Saturday.

Christopher_Fisher · February 27, 2021, 6:55pm

@ANasc, just a quick status update: DIC was added to a new repo called ModelComparisons, along with BIC and AIC. This repo also has WAIC and PSILOO. It will take a few days for registration to be approved. I will also submit a change to MCMCChains so that you can calculate DIC directly from a Chain object. If you need to use DIC immediately, you can switch to my forks for MCMCChains and ModelComparisons until new releases are official. I plan on adding functionality to hook MCMCChains into WAIC and PSILOO in the next week or so.

ANasc · February 28, 2021, 5:16am

Amazing! Thanks so much. I look forward to trying it out. I think this will be a very useful addition for a lot of people.

Storopoli · March 3, 2021, 7:43am

Great Job @Christopher_Fisher! Excited to see PSIS-LOO coming to life in MCMCChains.jl!

EvoArt · April 16, 2021, 9:33pm

Wow. Very excited about this! Thanks slot.

Christopher_Fisher · April 22, 2021, 1:26pm

We encountered a few issues during the process. One is a licensing issue for PSILOO. My plan is to try to implement a new version as a work around. I am also trying to figure out the best place to have this functionality. I will follow up with an update once I know more.

Storopoli · April 22, 2021, 2:03pm

Isn’t it under Creative Commons or something? Can’t we give deserved and mandatory attribution (citation) and use it?

Christopher_Fisher · April 22, 2021, 2:47pm

The PSILOO code is under a GNU license. Other developers are reluctant to use it because under some interpretations, the GNU licence would apply to dependencies and supercede the MIT license.

brett_knoss · May 1, 2021, 11:01am

I think Turin.jl can be used gor hypothesis testing, but I’m completely lost by the documentation.

Storopoli · May 2, 2021, 8:12am

You can try with the Savage-Dickey method: Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method - ScienceDirect

brett_knoss · May 2, 2021, 8:58pm

Looks, good, is there a package version, or do I need to put it together. I take it I can use MCMC, and it looks like this uses Mann Whitney U Test?

Topic		Replies	Views
WAIC, or LOO etc in Turing? Probabilistic Programming	11	4499	October 1, 2020
Bayesian model comparison / selection Statistics question , turing , bayesian-inference	3	1240	May 3, 2022
Bayes Testing Statistics question , package	27	1867	May 12, 2021
New Bayesian Statistics tutorials with Turing Probabilistic Programming turing	25	4592	July 15, 2022
Parameter Fitting in Turing Probabilistic Programming turing	15	1434	October 12, 2022

Bayesian Model Selection

Related topics