Bayesian Model Selection

Hi All,

This question might reveal my limited knowledge of Bayesian estimation, but I am having difficulty figuring out how to perform basic Bayesian model selection in Turing. The documentation for MCMCChains.jl has a short section on DIC, which I’m not very familiar with. How does one calculate Bayes factors, for example? I can’t even figure out how to get model Evidence from the output of sample(). (I’m using Metropolis-Hastings algorithm). Any tips or directions of where to look would be greatly appreciated.

Thanks

1 Like

I don’t know much about the metrics for model selection, but do know that various statisticians argue against Bayes factors (Tendeiro & Kiers, 2019; Gelman et al., 2020).

As far as I understand, McElreath (2020) and Gelman et al. (2020) advise to do model comparison based on predictive performance measures such as cross-validation and AIC. McElreath (2020) argues that these metrics are quite similar.

But, I think that it also depends on where you want to publish. My co-authors aren’t comfortable with cross-validation, so I will stick to a multiverse analysis (Steegen et al., 2016) with a multilevel model to reduce overfitting and clear graphs to show what the model does.

3 Likes

As far as I can tell, there are two options currently, First, there is a Julia interface to ArviZ, which is written in Python. Second, there is a package associated with the book Statistical Rethinking that implements WAIC and LOO. The downside to the second option is that it does not currently have diagnostics for the model selection criteria.

1 Like

Thank you both for the links, these are very useful. I will brush up on which approaches are most appropriate for my case and see if I can implement something.

I’ll also look more closely at DIC in MCMCChains. I can’t seem to get it to work. It gives me an error that I haven’t used the correct inputs for logpdf(), and I’m not sure exactly what needs to be in there (for both the distribution and x). I’m assuming it’s asking for the likelihood function (mine is BinomialLogit). Here is the function:

lpfun ​=​ ​function​ ​f​(chain​::​Chains​) ​#​ function to compute the logpdf values​
    niter, nparams, nchains ​=​ ​size​(chain)
    lp ​=​ ​zeros​(niter ​+​ nchains) ​#​ resulting logpdf values​
    ​for​ i ​=​ ​1​:​nparams
        lp ​+=​ ​map​(p ​->​ ​logpdf​( ​...​ , x), ​Array​(chain[:,i,:]))
    ​end​
    ​return​ lp
​end​
DIC, pD ​=​ ​dic​(chn, lpfun)```
1 Like

I think the problem is that you are supposed to add your distribution and data to the code. For example,

lpfun ​=​ ​function​ ​f​(chain​::​Chains​) ​#​ function to compute the logpdf values​
    niter, nparams, nchains ​=​ ​size​(chain)
    lp ​=​ ​zeros​(niter ​+​ nchains) ​#​ resulting logpdf values​
    ​for​ i ​=​ ​1​:​nparams
        lp ​+=​ ​map​(p ​->​ ​logpdf​(BinomialLogit(p...), your_data), ​Array​(chain[:,i,:]))
    ​end​
    ​return​ lp
​end​
DIC, pD ​=​ ​dic​(chn, lpfun)

However, I think this API might be difficult for users. It might be better to do something like this:

DIC, pD ​=​ ​dic​(chn, distribution_object, the_data)

Under my proposal, the function dic would create the lpfun function and perform the calculations as normal. This way the user does not need to modify functions.

@cpfiffer, would this be a better way to compute DIC or did I misunderstand the documentation?

2 Likes

Yeah, I think that’s probably a better way to do it. It seems like a pretty trivial thing to add that extension into MCMCChains – maybe a good first issue for somebody.

3 Likes

I tried to change the API, but I encountered several issues with the lpfun function. It almost seems like it was never fully tested and implemented. I opened an issue.

1 Like

@Christopher_Fisher Thanks for working on this, I really appreciate it.

`No problem. The new plan is to add DIC to PSS tomorrow or Saturday.

1 Like

@ANasc, just a quick status update: DIC was added to a new repo called ModelComparisons, along with BIC and AIC. This repo also has WAIC and PSILOO. It will take a few days for registration to be approved. I will also submit a change to MCMCChains so that you can calculate DIC directly from a Chain object. If you need to use DIC immediately, you can switch to my forks for MCMCChains and ModelComparisons until new releases are official. I plan on adding functionality to hook MCMCChains into WAIC and PSILOO in the next week or so.

7 Likes

Amazing! Thanks so much. I look forward to trying it out. I think this will be a very useful addition for a lot of people.

1 Like

Great Job @Christopher_Fisher! Excited to see PSIS-LOO coming to life in MCMCChains.jl!

1 Like

Wow. Very excited about this! Thanks slot.

1 Like

We encountered a few issues during the process. One is a licensing issue for PSILOO. My plan is to try to implement a new version as a work around. I am also trying to figure out the best place to have this functionality. I will follow up with an update once I know more.

Isn’t it under Creative Commons or something? Can’t we give deserved and mandatory attribution (citation) and use it?

The PSILOO code is under a GNU license. Other developers are reluctant to use it because under some interpretations, the GNU licence would apply to dependencies and supercede the MIT license.

I think Turin.jl can be used gor hypothesis testing, but I’m completely lost by the documentation.

You can try with the Savage-Dickey method: Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method - ScienceDirect

Looks, good, is there a package version, or do I need to put it together. I take it I can use MCMC, and it looks like this uses Mann Whitney U Test?