Noobish question regarding yhat probabilities

AdamWysokinski · January 1, 2022, 8:15pm

Hi,
I’ve just started learning MLJ and I’m recreating a RandomForrect-based binary classification that I did in Python/Scikit. The MLJ version works fine, with one major exception.When I put new data through the trained model, all classification results (based on yhat probabilities) are reversed i.e. group 0 instead of 1 and 1 instead of 0. After subtracting yhat probabilities from 1, I get proper classification. Could someone please enlighten me how to interpret the yhat probabilities?

yhat = MLJ.predict(model_rf_final, x)
p = yhat[idx].prob_given_ref
# this gets proper classification
p_group0 = 1 - p[1]  
p_group1 = 1 - p[2]
# this gets reversed classification
p_group0 = p[1]  
p_group1 = p[2]

e.g. for case 1, which belongs to group 0:

julia> yhat[1]                                                                                                                                                                            
         UnivariateFinite{OrderedFactor{2}}     
     ┌                                        ┐ 
   0 ┤■■■■■■■■■ 0.21                            
   1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.79   
     └                                        ┘

Thanks, Adam

ablaom · January 2, 2022, 2:51am

Thanks for reporting. Be helpful if you include a complete minimum working example. In particular, which RandomForest classifier are you using? I think 3 different packages provide a model of that name.

ablaom · January 2, 2022, 3:05am

Ah, I see you are using prob_given_ref, which is a private variable. This may be the problem, as the refs are internal representations of the class, not necessarily the class labels you trained with (which don’t have to be integers.) To access the probabilities in MLJ you should use the pdf method (actually from Distributions.jl) as shown in the examples in “Getting Started”.

In ScikitLearn-learn classes are always integers with no tracking of the complete pool. For more on working with categorical data in MLJ see here.

AdamWysokinski · January 2, 2022, 11:26am

Hi,
Thank you, I did as instructed and learned about using pdf.
The problem was however in another part of the program, where classes were incorrectly labeled in the training material

Topic		Replies	Views
Unexpected Behavior in LogisticClassifier MLJLinearModels Machine Learning question	7	642	November 16, 2022
How to properly use MLJBase.roc_curve New to Julia question , package , mlj	4	649	December 12, 2022
Help me convert a weird data type into booleans! New to Julia question , dataframes , mlj	6	613	January 31, 2023
MLJ confusion_matrix() - MethodError Machine Learning question , package	5	1309	September 18, 2020
How to create a MLJModelInterface.Model interface of a complex model? Machine Learning mlj	1	365	February 25, 2021

Noobish question regarding yhat probabilities

Related topics