I’m working on an analysis of some HR data that is attempting to answer the following question:
Do applicants of different races have substantially different probabilities of being selected?
For now, I’m focusing on a single round of hiring that resulted in the selection of 53 candidates from an applicant pool of 211. For each applicant, I have their race, current performance rating, and years of experience. All applicants were internal applicants to a large organization; they all held the same job; and, they all were applying for the same job (for which there were 53 openings).
I’ve formulated the following Bayesian model, and I’m seeking feedback as I’m inexperienced with this kind of modeling:
\begin{align} S_{i} &\sim \text{Bernoulli}(p_{i}) \\ \text{logit}(p_{i}) &= \alpha_{\text{RID}[i]} + \beta_{\text{PID}[i]} + \gamma y_{i}\\ \alpha_{j} &\sim \text{Normal}(\text{logit}(0.25), 1.5) \\ \beta_{\text{outstanding}} &\sim \text{Normal}(\text{logit}(0.9), 30) \\ \beta_{\text{exceeds}} &\sim \text{Normal}(\text{logit}(0.85), 30) \\ \beta_{\text{meets}} &\sim \text{Normal}(\text{logit}(0.505), 30) \\ \beta_{\text{min. sat.}} &\sim \text{Normal}(\text{logit}(0.2), 30) \\ \beta_{\text{nr}} &\sim \text{Normal}(\text{logit}(0.2), 30) \\ \gamma &\sim \text{Normal}(0, 1.5) \end{align}
With this, my intent is to start with the prior belief that each applicant has a 25% chance of being selected, regardless of race. Then, those applicants with “outstanding” and “exceeds” performance ratings have a higher likelihood of being selected than applicants with “meets,” “minimally satisfactory,” and “not ratable” ratings. Finally, the γ parameter is intended to capture the effect of years of experience.
I selected the priors for the β parameters by simply playing around with the numbers until they resulted in selection probabilities that I think are reasonable, based on what I know about the organization and previous rounds of hiring. However, these were the most difficult priors for me to determine, so I’m using a huge standard deviation so as to allow the data to dominate the end result. The years of experience variable has been normalized, so the prior for the γ parameter reflects the idea that the selection probability is lower for candidates with below-average years of experience, and higher for candidates with above-average experience.
I just want to make sure that there are no major flaws in the structure of this model for answering the question at hand, which is simply to understand if the selection probabilities by race were significantly different.