How to structure Bayesian model for hiring data based on race, performance, and years of experience

I’m working on an analysis of some HR data that is attempting to answer the following question:

Do applicants of different races have substantially different probabilities of being selected?

For now, I’m focusing on a single round of hiring that resulted in the selection of 53 candidates from an applicant pool of 211. For each applicant, I have their race, current performance rating, and years of experience. All applicants were internal applicants to a large organization; they all held the same job; and, they all were applying for the same job (for which there were 53 openings).

I’ve formulated the following Bayesian model, and I’m seeking feedback as I’m inexperienced with this kind of modeling:

\begin{align} S_{i} &\sim \text{Bernoulli}(p_{i}) \\ \text{logit}(p_{i}) &= \alpha_{\text{RID}[i]} + \beta_{\text{PID}[i]} + \gamma y_{i}\\ \alpha_{j} &\sim \text{Normal}(\text{logit}(0.25), 1.5) \\ \beta_{\text{outstanding}} &\sim \text{Normal}(\text{logit}(0.9), 30) \\ \beta_{\text{exceeds}} &\sim \text{Normal}(\text{logit}(0.85), 30) \\ \beta_{\text{meets}} &\sim \text{Normal}(\text{logit}(0.505), 30) \\ \beta_{\text{min. sat.}} &\sim \text{Normal}(\text{logit}(0.2), 30) \\ \beta_{\text{nr}} &\sim \text{Normal}(\text{logit}(0.2), 30) \\ \gamma &\sim \text{Normal}(0, 1.5) \end{align}

With this, my intent is to start with the prior belief that each applicant has a 25% chance of being selected, regardless of race. Then, those applicants with “outstanding” and “exceeds” performance ratings have a higher likelihood of being selected than applicants with “meets,” “minimally satisfactory,” and “not ratable” ratings. Finally, the γ parameter is intended to capture the effect of years of experience.

I selected the priors for the β parameters by simply playing around with the numbers until they resulted in selection probabilities that I think are reasonable, based on what I know about the organization and previous rounds of hiring. However, these were the most difficult priors for me to determine, so I’m using a huge standard deviation so as to allow the data to dominate the end result. The years of experience variable has been normalized, so the prior for the γ parameter reflects the idea that the selection probability is lower for candidates with below-average years of experience, and higher for candidates with above-average experience.

I just want to make sure that there are no major flaws in the structure of this model for answering the question at hand, which is simply to understand if the selection probabilities by race were significantly different.

It seems to me like this is a good start.

I don’t quite get the alpha parameter notation, but it seems like you are describing a shared intercept (across all applicants) along with an effect to due race (notated with the j index). It might be clearer to have an explicit intercept parameter, which can have it’s own prior, and then zero centered priors for the other parameters for features that differ across applicants.

It sounds like you’re already doing some prior predictive checks, which is good check to see if your priors are unreasonable.

You may want to do some simulation studies to determine how wide you want your prior to be. For example, with the same feature data (race, performance rating, years of experience), choose some fixed values of the parameters alpha, beta, and gamma, then use the forward model you’ve described to sample S. You can then sample from the posterior distribution conditioned on the simulated success data. Now you can examine the bias between your inferred model parameters and the “known” values you used to simulated the success data.

If you repeat this process many times for a representative sample of parameter values (and a fixed range of prior widths), you can get a sense of how your bias in the estimated parameters changes with a range of prior widths.

Personally, I think I would set the mean hyperparameters of the priors to be zero, unless you have a lot of prior observations that inform those values. And if that was the case, the best thing to do would be to model both the prior and current observations together in one go, but that might involve a more complicated model.

And (again in the absence of substantial prior observations) I think it would be better to have all of the prior distributions sharing the same mean and variance, rather than having some of them with the wide value (30) but biased means.

1 Like

Thanks so much for the feedback - I really appreciate it. I had forgotten that Turing.jl allows you to sample from the model’s prior with the Prior() sampler, which is really, really helpful.