Noobish question regarding yhat probabilities

Hi,
I’ve just started learning MLJ and I’m recreating a RandomForrect-based binary classification that I did in Python/Scikit. The MLJ version works fine, with one major exception.When I put new data through the trained model, all classification results (based on yhat probabilities) are reversed i.e. group 0 instead of 1 and 1 instead of 0. After subtracting yhat probabilities from 1, I get proper classification. Could someone please enlighten me how to interpret the yhat probabilities?

yhat = MLJ.predict(model_rf_final, x)
p = yhat[idx].prob_given_ref
# this gets proper classification
p_group0 = 1 - p[1]  
p_group1 = 1 - p[2]
# this gets reversed classification
p_group0 = p[1]  
p_group1 = p[2]

e.g. for case 1, which belongs to group 0:

julia> yhat[1]                                                                                                                                                                            
         UnivariateFinite{OrderedFactor{2}}     
     ┌                                        ┐ 
   0 ┤■■■■■■■■■ 0.21                            
   1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.79   
     └                                        ┘ 

Thanks, Adam

Thanks for reporting. Be helpful if you include a complete minimum working example. In particular, which RandomForest classifier are you using? I think 3 different packages provide a model of that name.

Ah, I see you are using prob_given_ref, which is a private variable. This may be the problem, as the refs are internal representations of the class, not necessarily the class labels you trained with (which don’t have to be integers.) To access the probabilities in MLJ you should use the pdf method (actually from Distributions.jl) as shown in the examples in “Getting Started”.

In ScikitLearn-learn classes are always integers with no tracking of the complete pool. For more on working with categorical data in MLJ see here.

Hi,
Thank you, I did as instructed and learned about using pdf.
The problem was however in another part of the program, where classes were incorrectly labeled in the training material :man_facepalming:

1 Like