Hello I am new to probabilistic programming and some (I suppose basic things)
Firs my goal is to design model that given set of features will predict binary outcome variable. Hovewer additionally I would like to know associated uncertanity, of the model.
In principle as far as I understand it should be easy
1)define some model like in tutorial
2) supply data and perform variational infrence like in tutorial
3) Now we have a probability distribution that has learned parameters to describe data
4) supply validation data
5) get the probability that given data point (represented by row in dataframe)
is in class 1 (we are in binary setting) so the higher the p the more certain is the model that class 1 is a correct answer
I had for example experimented with tutorial on baysian neural network [1] which needs to classify points into y1 or y2.
I had managed to perform parameter estimation via variational infrence instead of chains to get the probability distribution - still I do not know how to use this distribution to get a probability of given point being in y1 or y2, there is only maximum a posteriori example without probability estimation.
1)Bayesian Neural Networks – Turing.jl
Thanks for help!
If you haven’t already, I would take a look at the following tutorial:
https://turinglang.org/docs/tutorials/02-logistic-regression/
Also, I would highly recommend Richard McElreath’s Statistical Rethinking and the following site that ports the code to Julia:
3 Likes
From the preface of the 2nd edition of the book:
“Why R?. This book uses R for the same reason that it uses English: Lots of people know it already. R is convenient for doing computational statistics. But many other languages are just fine. I recommend Python (especially PyMC) and Julia as well. The first edition ended up with code translations for various languages and styles. Hopefully the second edition will as well.”
1 Like
Thank you ! I had read through logistic regression tutorial earlier, but I suppose I did not understood it as I should.
In a model we have a line
v = logistic(intercept + student * x[i, 1] + balance * x[i, 2] + income * x[i, 3])
y[i] ~ Bernoulli(v)
and then in prediction we are taking v and doing prediction based on v greater than 0.07.
and Here I have a problem If v is a parameter to Bernoulli threshold in principle should be 0.5 ? - as probability of class 1 is greater then class 0 if parameter is greater than 0.5 ; Am I mistaken or just logistic regression model is badly calibrated ?
for example if i set in this example treshold to 0.5 I get
Percentage defaults correct 0.2563291139240506
Percentage non-defaults correct 0.9976045296167247
and with treshold 0.07
Percentage defaults correct 0.8386075949367089
Percentage non-defaults correct 0.8920949477351916
For visualization I had created bar graph that plot value predicted by the model - blue bars are class 1 and orange bar class 0 ; so this would prove that it is miscallibration ? (I do not say that code is wrong no model is perfect just trying to understand root problem)
I haven’t looked at the dataset they use for that tutorial, but I assume that the lower threshold is necessary because of class imbalance, i.e., there are many more loans that did not default than those that did. Without balancing the data and with a threshold of 0.5, the model will most likely always predict non-default. They’ve chosen to deal with that by using a low threshold so that the predictions are more sensitive to detecting defaults.