In Turing.jl get probability of the answer of the datapoint from validation set

Jakub_Mitura · August 12, 2024, 5:12pm

Hello I am new to probabilistic programming and some (I suppose basic things)
Firs my goal is to design model that given set of features will predict binary outcome variable. Hovewer additionally I would like to know associated uncertanity, of the model.

In principle as far as I understand it should be easy
1)define some model like in tutorial
2) supply data and perform variational infrence like in tutorial
3) Now we have a probability distribution that has learned parameters to describe data
4) supply validation data
5) get the probability that given data point (represented by row in dataframe)
is in class 1 (we are in binary setting) so the higher the p the more certain is the model that class 1 is a correct answer

I had for example experimented with tutorial on baysian neural network [1] which needs to classify points into y1 or y2.
I had managed to perform parameter estimation via variational infrence instead of chains to get the probability distribution - still I do not know how to use this distribution to get a probability of given point being in y1 or y2, there is only maximum a posteriori example without probability estimation.

1)Bayesian Neural Networks – Turing.jl

Thanks for help!

mthelm85 · August 12, 2024, 5:30pm

If you haven’t already, I would take a look at the following tutorial:

https://turinglang.org/docs/tutorials/02-logistic-regression/

Also, I would highly recommend Richard McElreath’s Statistical Rethinking and the following site that ports the code to Julia:

rafael.guerra · August 12, 2024, 8:44pm

From the preface of the 2nd edition of the book:

“Why R?. This book uses R for the same reason that it uses English: Lots of people know it already. R is convenient for doing computational statistics. But many other languages are just fine. I recommend Python (especially PyMC) and Julia as well. The first edition ended up with code translations for various languages and styles. Hopefully the second edition will as well.”

Jakub_Mitura · August 13, 2024, 12:36pm

Thank you ! I had read through logistic regression tutorial earlier, but I suppose I did not understood it as I should.

In a model we have a line

        v = logistic(intercept + student * x[i, 1] + balance * x[i, 2] + income * x[i, 3])
        y[i] ~ Bernoulli(v)

and then in prediction we are taking v and doing prediction based on v greater than 0.07.

and Here I have a problem If v is a parameter to Bernoulli threshold in principle should be 0.5 ? - as probability of class 1 is greater then class 0 if parameter is greater than 0.5 ; Am I mistaken or just logistic regression model is badly calibrated ?

for example if i set in this example treshold to 0.5 I get

    Percentage defaults correct 0.2563291139240506
    Percentage non-defaults correct 0.9976045296167247

and with treshold 0.07

    Percentage defaults correct 0.8386075949367089
    Percentage non-defaults correct 0.8920949477351916

For visualization I had created bar graph that plot value predicted by the model - blue bars are class 1 and orange bar class 0 ; so this would prove that it is miscallibration ? (I do not say that code is wrong no model is perfect just trying to understand root problem)

mthelm85 · August 13, 2024, 2:12pm

I haven’t looked at the dataset they use for that tutorial, but I assume that the lower threshold is necessary because of class imbalance, i.e., there are many more loans that did not default than those that did. Without balancing the data and with a threshold of 0.5, the model will most likely always predict non-default. They’ve chosen to deal with that by using a low threshold so that the predictions are more sensitive to detecting defaults.

Topic		Replies	Views
GLM model from statistical rethinking video series Probabilistic Programming turing	0	187	March 27, 2023
My Turing.jl models Probabilistic Programming turing	2	624	July 26, 2021
Bayesian logistic regression with Turing.jl Probabilistic Programming turing , monte-carlo	29	4479	May 18, 2021
Posterior prediction in Turing Probabilistic Programming turing	4	505	April 13, 2023
Please tell me about the following Turing.jl processing Probabilistic Programming question	1	355	May 15, 2021

In Turing.jl get probability of the answer of the datapoint from validation set

Related topics