I am trying to make a RF predictor of a categorical variable (3 levels), and then look at how well the model works. I mostly followed the steps in the Getting Started section of the docs - here’s a MWE:
using DataFrames
using MLJ
using DecisionTree
using Random
df = DataFrame(a = categorical(rand(['a', 'b', 'c'], 100)), t = repeat(["test", "train"], inner=50), x1=rand(100), x2=rand(100))
train = findall(x-> x == "train", df.t)
test = findall(x-> x == "test", df.t)
y, X = unpack(df, ==(:a), x-> x in [:x1, :x2])
tree_model = MLJ.@load DecisionTreeClassifier verbosity=1
tree = machine(tree_model, X, y)
MLJ.fit!(tree, rows=train)
yhat = MLJ.predict(tree, X[test,:])
cross_entropy(yhat, y[test]) |> mean # this works
MLJ.confusion_matrix(yhat, y[test]) # this doesn't
That last gives me
ERROR: MethodError: no method matching confusion_matrix(::MLJBase.UnivariateFiniteVector{Multiclass{3}, Char, UInt32, Float64}, ::CategoricalVector{Char, UInt32, Char, CategoricalValue{Char, UInt32}, Union{}})
Closest candidates are:
confusion_matrix(::AbstractVector{var"#s887"} where var"#s887"<:CategoricalValue, ::AbstractVector{var"#s886"} where var"#s886"<:CategoricalValue; rev, perm, warn) at /home/kevin/.julia/packages/MLJBase/uKzAz/src/measures/confusion_matrix.jl:64
I assumed that this would work given the function signature, though the text
Computes the confusion matrix given a predicted
ŷ
with categorical elements and the actualy
and the error lead me to think I may have to convert the UnivariateFinite{Multiclass{3}, Char, UInt32, Float64}
of yhat
to an actual categorical array, but it’s not imediately clear how to do that. The probabilities in yhat
all seem to be 1.0
or 0.0
, so it seems like this should be straightforward.
I naively tried categorical(yhat)
, and that actually runs, but the categorical array of UnivariateFinite
has a different order, so I’m assuming that’s not actually correct:
julia> levels(categorical(yhat))
3-element Vector{UnivariateFinite{Multiclass{3}, Char, UInt32, Float64}}:
UnivariateFinite{Multiclass{3}}(a=>0.0, b=>1.0, c=>0.0)
UnivariateFinite{Multiclass{3}}(a=>1.0, b=>0.0, c=>0.0)
UnivariateFinite{Multiclass{3}}(a=>0.0, b=>0.0, c=>1.0)
julia> levels(y[test])
3-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
So - is there a straightforward way to convert the array of UnivariateFinite
to a categorical array?