MLJ: Evaluating a probabilistic metric and a deterministic metric at the same time

The evaluate and evaluate! methods in MLJ can accept a vector of metric functions. However, it doesn’t appear that you can evaluate metrics based on probabilistic predictions (e.g. AUC) and metrics based on deterministic predictions (e.g. accuracy) at the same time. Here’s a MWE:

using DataFrames
using RDatasets
using MLJ
using MLJLinearModels

iris = dataset("datasets", "iris")
df = filter(r -> r.Species != "virginica", iris)
y = droplevels!(copy(df.Species))
X = select(df, Not(:Species))

model = LogisticClassifier(penalty=:none)
logistic_machine = machine(model, X, y)
holdout = Holdout(shuffle=true, rng=1)

logistic_auc = evaluate!(
    resampling = holdout,
    measure = auc

logistic_accuracy = evaluate!(
    resampling = holdout,
    operation = predict_mode,
    measure = accuracy

Does anyone know if it’s possible to evaluate auc and accuracy at the same time without having to run evaluate! twice?

In MLJ accuracy measure is only defined for deterministic classifiers. You could define your custom accuracy measure that works on probabilistic classifiers using the code below.

custom_accuracy(yhat, y) = accuracy(mode.(yhat), y)
MLJ.reports_each_observation(::typeof(custom_accuracy)) = false
MLJ.supports_weights(::typeof(custom_accuracy)) = true
MLJ.orientation(::typeof(custom_accuracy)) = :score 
MLJ.is_feature_dependent(::typeof(custom_accuracy)) = :false
MLJ.prediction_type(::typeof(custom_accuracy)) = :probabilistic

Then you could then do

logistic_auc_accuracy = evaluate!(
    resampling = holdout,
    measure = [auc, custom_accuracy]

Awesome, thanks @samuel_okon! That’s a good solution. Though since MLJ measures have a prediction_type trait, it seems like it might be possible to extend evaluate to accept measures for different prediction types and have evaluate automatically run each of the necessary types of prediction. Maybe I’ll make a PR for that. :slight_smile:

That sounds nice. But the problem is that these measures were defined for either Deterministic or Probabilistic classifiers not both. Also applying a Probabilistic measure on Deterministic outputs won’t be well defined since vector of UnivariateFinite is needed.

Yeah, I was thinking that if measure = [auc, accuracy], then maybe internally evaluate could run both predict and predict_mode to get the two separate types of prediction. Then it could use the proper type of prediction for each metric.

1 Like