I am trying to make a RF predictor of a categorical variable (3 levels), and then look at how well the model works. I mostly followed the steps in the Getting Started section of the docs - here’s a MWE:
using DataFrames
using MLJ
using DecisionTree
using Random
df = DataFrame(a = categorical(rand(['a', 'b', 'c'], 100)), t = repeat(["test", "train"], inner=50), x1=rand(100), x2=rand(100))
train = findall(x-> x == "train", df.t)
test = findall(x-> x == "test", df.t)
y, X = unpack(df, ==(:a), x-> x in [:x1, :x2])
tree_model = MLJ.@load DecisionTreeClassifier verbosity=1
tree = machine(tree_model, X, y)
MLJ.fit!(tree, rows=train)
yhat = MLJ.predict(tree, X[test,:])
cross_entropy(yhat, y[test]) |> mean # this works
MLJ.confusion_matrix(yhat, y[test]) # this doesn't
That last gives me
ERROR: MethodError: no method matching confusion_matrix(::MLJBase.UnivariateFiniteVector{Multiclass{3}, Char, UInt32, Float64}, ::CategoricalVector{Char, UInt32, Char, CategoricalValue{Char, UInt32}, Union{}})
Closest candidates are:
  confusion_matrix(::AbstractVector{var"#s887"} where var"#s887"<:CategoricalValue, ::AbstractVector{var"#s886"} where var"#s886"<:CategoricalValue; rev, perm, warn) at /home/kevin/.julia/packages/MLJBase/uKzAz/src/measures/confusion_matrix.jl:64
I assumed that this would work given the function signature, though the text
Computes the confusion matrix given a predicted
ŷwith categorical elements and the actualy
and the error lead me to think I may have to convert the UnivariateFinite{Multiclass{3}, Char, UInt32, Float64} of yhat to an actual categorical array, but it’s not imediately clear how to do that. The probabilities in yhat all seem to be 1.0 or 0.0, so it seems like this should be straightforward.
I naively tried categorical(yhat), and that actually runs, but the categorical array of UnivariateFinite has a different order, so I’m assuming that’s not actually correct:
julia> levels(categorical(yhat))
3-element Vector{UnivariateFinite{Multiclass{3}, Char, UInt32, Float64}}:
 UnivariateFinite{Multiclass{3}}(a=>0.0, b=>1.0, c=>0.0)
 UnivariateFinite{Multiclass{3}}(a=>1.0, b=>0.0, c=>0.0)
 UnivariateFinite{Multiclass{3}}(a=>0.0, b=>0.0, c=>1.0)
julia> levels(y[test])
3-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
 'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
So - is there a straightforward way to convert the array of UnivariateFinite to a categorical array?

