Using measure in MLJ to evaluate binary classifier

Hello everybody,
I am trying to use the measures methods in MLJ to evaluate binary classifications

I basically have integer vectors of 0 and 1 for ground truth (gt from now on) and predictions (pred from now on).
When I try to use confusion_matrix using the integer vectors I get an error similar to the one described in this post.

The problem is almost solved when I provide vectors of Strings use categorical to convert them.
Following is a toy example:

using MLJ

predStr = ["fast", "fast", "slow"];
gtStr = ["slow", "fast", "slow"];

cmtx = confusion_matrix(categorical(gtStr), categorical(predStr))
┌ Warning: The classes are un-ordered,
│ using: negative='fast' and positive='slow'.
│ To suppress this warning, consider coercing to OrderedFactor.
└ @ MLJBase ~/.julia/packages/MLJBase/7hkEm/src/measures/confusion_matrix.jl:96
              ┌───────────────────────────┐
              │       Ground Truth        │
┌─────────────┼─────────────┬─────────────┤
│  Predicted  │    fast     │    slow     │
├─────────────┼─────────────┼─────────────┤
│    fast     │      1      │      0      │
├─────────────┼─────────────┼─────────────┤
│    slow     │      1      │      1      │
└─────────────┴─────────────┴─────────────┘

For the specific case above I would like to know order the classes and suppress the warning.

In general I would like to know there is a general way to provide input to the methods in measures in the MLJ package.
So far, using the categorical function seem to work but would love to hear your opinion.

Thanks a lot for being such a great community!

You can make your categorical arrays ordered:

using MLJ

predStr = categorical(
    ["fast", "fast", "slow"];
    ordered = true,
    levels = ["slow", "fast"]
)

gtStr = categorical(
    ["slow", "fast", "slow"];
    ordered = true,
    levels = ["slow", "fast"]
)
julia> confusion_matrix(predStr, gtStr)
              ┌───────────────────────────┐
              │       Ground Truth        │
┌─────────────┼─────────────┬─────────────┤
│  Predicted  │    slow     │    fast     │
├─────────────┼─────────────┼─────────────┤
│    slow     │      1      │      0      │
├─────────────┼─────────────┼─────────────┤
│    fast     │      1      │      1      │
└─────────────┴─────────────┴─────────────┘

Note that according to the docstring for ConfusionMatrix, the first argument should be the predicted values and the second argument should be the true values.

Let’s compare the scientific types of an ordered and an unordered categorical array:

julia> scitype(predStr)
AbstractVector{OrderedFactor{2}}

julia> scitype(categorical(["a", "b"]))
AbstractVector{Multiclass{2}}

Also note that you can make categorical arrays with integers:

pred = categorical([1, 1, 0]; ordered=true)
gt = categorical([0, 1, 0]; ordered=true)
julia> confusion_matrix(pred, gt)
              ┌───────────────────────────┐
              │       Ground Truth        │
┌─────────────┼─────────────┬─────────────┤
│  Predicted  │      0      │      1      │
├─────────────┼─────────────┼─────────────┤
│      0      │      1      │      0      │
├─────────────┼─────────────┼─────────────┤
│      1      │      1      │      1      │
└─────────────┴─────────────┴─────────────┘

Thanks a lot @CameronBieganek
I should have read better the documentation regarding scitype

Thank again!

1 Like