Can anyone suggest a good forum for asking questions/discussing Bayesian inference? I’ve tried a couple of different stats/ML forums in the past, and I’ve never gotten a single response. What I’m looking for is a place to discuss model structure, prior specification, etc. Then, if I have questions about Julia implementation, I would obviously come here.
Did you try UQworld ? It is mainly about uncertainty quantification, but you have some sections dedicated to Bayesian Inference.
I had never heard of this - thanks for the info!
The Stan discourse forum is really good.
I don’t see questions about basic maths or methods as off-topic (or even Offtopic) here, either — even if it doesn’t have a lick of code. You could also just ask in Statistics or Machine Learning, but your audience is of course smaller and you may not get as much traction as in other more dedicated spaces.
So, what prompted me to ask this here is that this morning, I posted the question below to the Cross Validated Stack Exchange forum, which is described as, “a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.”
Just as I expected, it’s been up for 9 hours now and has only had 11 views (several of which I’m assuming were actually just me looking at it) and 0 responses.
I’m working on an analysis of some HR data that is attempting to answer the following question:
Do applicants of different races have substantially different probabilities of being selected?
For now, I’m focusing on a single round of hiring that resulted in the selection of 53 candidates from an applicant pool of 211. For each applicant, I have their race, current performance rating, and years of experience. All applicants were internal applicants to a large organization; they all held the same job; and, they all were applying for the same job (for which there were 53 openings).
I’ve formulated the following Bayesian model, and I’m seeking feedback as I’m very inexperienced with this kind of modeling:
\begin{align} S_{i} &\sim \text{Bernoulli}(p_{i}) \\ \text{logit}(p_{i}) &= \alpha_{\text{RID}[i]} + \beta_{\text{PID}[i]} + \gamma y_{i}\\ \alpha_{j} &\sim \text{Normal}(\text{logit}(0.25), 1.5) \\ \beta_{\text{outstanding}} &\sim \text{Normal}(\text{logit}(0.9), 30) \\ \beta_{\text{exceeds}} &\sim \text{Normal}(\text{logit}(0.85), 30) \\ \beta_{\text{meets}} &\sim \text{Normal}(\text{logit}(0.505), 30) \\ \beta_{\text{min. sat.}} &\sim \text{Normal}(\text{logit}(0.2), 30) \\ \beta_{\text{nr}} &\sim \text{Normal}(\text{logit}(0.2), 30) \\ \gamma &\sim \text{Normal}(0, 1.5) \end{align}
With this, my intent is to start with the prior belief that each applicant has a 25% chance of being selected, regardless of race. Then, those applicants with “outstanding” and “exceeds” performance ratings have a higher likelihood of being selected than applicants with “meets,” “minimally satisfactory,” and “not ratable” ratings. Finally, the γ parameter is intended to capture the effect of years of experience.
I selected the priors for the β parameters by simply playing around with the numbers until they resulted in selection probabilities that I think are reasonable, based on what I know about the organization and previous rounds of hiring. However, these were the most difficult priors for me to determine, so I’m using a huge standard deviation so as to allow the data to dominate the end result. The years of experience variable has been normalized, so the prior for the γ parameter reflects the idea that the selection probability is lower for candidates with below-average years of experience, and higher for candidates with above-average experience.
I just want to make sure that there are no major flaws in the structure of this model for answering the question at hand, which is simply to understand if the selection probabilities by race were significantly different.
If this type of question is fair game here (I have implemented the model in Turing.jl), I’ll start a new thread for it!
I’ve asked some Bayesian questions before within the Julia community, not really knowing where else to turn. Luckily we’ve got some very helpful folk sin the community. The turing channel has been pretty good about responding to questions of that sort, I think. Also #probprog and statistics are other places one might ask depending on the bent of the question.
*edit: Discourse automatically linked to a Discourse location, but I meant slack channels.
I think it would be nice to have more discussions like this in the forums, they make for a good signal and learning resource for newcomers to Bayesian stats / probabilistic programming in julia.
Regarding your priors for the coefficients, I would say they are too wide - on a logit scale everything above a value of 4 (logistic(4) = .982...
) is more or less guaranteed. So usually, a standard deviation of around 2 for the prior is wide enough to be just weakly informative for logistic models (assuming you have standardized / binary variables). With an sd of 30, most of the probability mass is on unreasonably extreme values.
In general, for figuring out priors, the recommended practice is usually to use prior predictive checks: You draw a bunch of samples from the prior and then predict your outcome with that to inspect the relationships in your variables that your model allows for / expects without having seen any data. I think turing allows you to easily do that.
Thanks for the feedback - I appreciate it! Indeed, Turing has a Prior()
sampler (that I had forgotten about!).
I agree that more discussions like this would be great and could bring more users to Turing and similar packages. My question has been on Cross Validated for 24 hours now and still only has 14 views, 0 responses!