MLJ confusion_matrix() - MethodError

I am trying to make a RF predictor of a categorical variable (3 levels), and then look at how well the model works. I mostly followed the steps in the Getting Started section of the docs - here’s a MWE:

using DataFrames
using MLJ
using DecisionTree
using Random

df = DataFrame(a = categorical(rand(['a', 'b', 'c'], 100)), t = repeat(["test", "train"], inner=50), x1=rand(100), x2=rand(100))
train = findall(x-> x == "train", df.t)
test = findall(x-> x == "test", df.t)

y, X = unpack(df, ==(:a), x-> x in [:x1, :x2])

tree_model = MLJ.@load DecisionTreeClassifier verbosity=1
tree = machine(tree_model, X, y)!(tree, rows=train)
yhat = MLJ.predict(tree, X[test,:])

cross_entropy(yhat, y[test]) |> mean # this works

MLJ.confusion_matrix(yhat, y[test]) # this doesn't

That last gives me

ERROR: MethodError: no method matching confusion_matrix(::MLJBase.UnivariateFiniteVector{Multiclass{3}, Char, UInt32, Float64}, ::CategoricalVector{Char, UInt32, Char, CategoricalValue{Char, UInt32}, Union{}})
Closest candidates are:
  confusion_matrix(::AbstractVector{var"#s887"} where var"#s887"<:CategoricalValue, ::AbstractVector{var"#s886"} where var"#s886"<:CategoricalValue; rev, perm, warn) at /home/kevin/.julia/packages/MLJBase/uKzAz/src/measures/confusion_matrix.jl:64

I assumed that this would work given the function signature, though the text

Computes the confusion matrix given a predicted with categorical elements and the actual y

and the error lead me to think I may have to convert the UnivariateFinite{Multiclass{3}, Char, UInt32, Float64} of yhat to an actual categorical array, but it’s not imediately clear how to do that. The probabilities in yhat all seem to be 1.0 or 0.0, so it seems like this should be straightforward.

I naively tried categorical(yhat), and that actually runs, but the categorical array of UnivariateFinite has a different order, so I’m assuming that’s not actually correct:

julia> levels(categorical(yhat))
3-element Vector{UnivariateFinite{Multiclass{3}, Char, UInt32, Float64}}:
 UnivariateFinite{Multiclass{3}}(a=>0.0, b=>1.0, c=>0.0)
 UnivariateFinite{Multiclass{3}}(a=>1.0, b=>0.0, c=>0.0)
 UnivariateFinite{Multiclass{3}}(a=>0.0, b=>0.0, c=>1.0)

julia> levels(y[test])
3-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)

So - is there a straightforward way to convert the array of UnivariateFinite to a categorical array?

MLJ.confusion_matrix(mode.(yhat), y[test]) should do the trick. UnivariateFiniteArray object is an AbstractVector of UnivariateFinite (which is basically a distribution) so we have to take mode


Ah, great!

FYI - formatting is a bite weird in your answer - guessing you have a stray ` somewhere :slight_smile:

Yeah. Fixed that

The backtick placement still seems to be slightly off… :joy:


Don’t mind my clumsiness

