Aside from that, it might help if you could identify three components: (1) the data, (2) deterministic components of the model, and (3) the parameters of your model.
Sorry, my terminology might be a bit lax as I continue to learn more about bayesian modeling. Let me give it a shot:
weights = [100, 50 , 50, 150, 200 ]
trials = [1, 0, 0, 1, 0 ]
Each potential insurance policy has a known benefit (ie the weights above). If a claim happens, the amount of benefit paid is fixed.
Whether or not a policyholder has a claim.
- I know that the probability of this varies a priori by size (ie the weight).
one model that comes to mind when you say risk and claim probability are correlated is a logistic regression. This would allow you to characterize the relationship between weights and trials with an equation that adjusts the Bernoulli parameter.
Based on my understanding of the logistic probit model, that would be useful for predicting whether or not a given risk has a claim; or to discern the correlations between different data features and the outcome.
However, my end use case is not to predict whether or not an individual claim occurs, but to use a probability (ie
q_weighted in my OP) to model the aggregate outcome of a similar population.
I admit that the ideal setup in my end use case would be to individually predict whether a claim occurs or not based on the set of data features relevant to that insurance policy. However, the modeling software I use does not accept that type of input. Therefore to calibrate the dollar amount of loss I am looking to derive a weighted estimated parameter (which it’s easy to get the point estimate,
q_weighted as above). However, I want to understand the potential distribution of that estimate rather than just derive a point estimate and I am not sure how to accomplish that.
Apologies if I’m being dense, I’m trying to take some of the things I’ve learned about bayesian modeling but have had a hard time adapting it to this particular problem.