This has definitely bitten me before - roc_curve doesn’t have great documentation and the error is pretty unhelpful. It works if you stay in the MLJ ecosystem - I believe it expects predictions in the form of MLJ’s own UnivariateFinite type, i.e. a distribution over classes (although @ablaom will be able to confirm whether this is true).
Here’s a full MWE:
julia> using MLJ, MLJDecisionTreeInterface
julia> X = rand(100, 3); y = rand(Bool, 100);
julia> Tree = @load RandomForestClassifier pkg=DecisionTree
[ Info: For silent loading, specify `verbosity=0`.
import MLJDecisionTreeInterface ✔
RandomForestClassifier
julia> tree = Tree()
RandomForestClassifier(
max_depth = -1,
min_samples_leaf = 1,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = -1,
n_trees = 10,
sampling_fraction = 0.7,
feature_importance = :impurity,
rng = Random._GLOBAL_RNG())
julia> mach = machine(tree, (x1 = X[:, 1], x2 = X[:, 2]), categorical(y))
untrained Machine; caches model-specific representations of data
model: RandomForestClassifier(max_depth = -1, …)
args:
1: Source @514 ⏎ Table{AbstractVector{Continuous}}
2: Source @659 ⏎ AbstractVector{Multiclass{2}}
julia> fit!(mach)
[ Info: Training machine(RandomForestClassifier(max_depth = -1, …), …).
trained Machine; caches model-specific representations of data
model: RandomForestClassifier(max_depth = -1, …)
args:
1: Source @514 ⏎ Table{AbstractVector{Continuous}}
2: Source @659 ⏎ AbstractVector{Multiclass{2}}
julia> ŷ = predict(mach);
julia> typeof(ŷ)
UnivariateFiniteVector{Multiclass{2}, Bool, UInt32, Float64}
julia> roc_curve(ŷ, y)
([0.0, 0.034482758620689655, 0.10344827586206896, 0.13793103448275862, 0.1724137931034483, 0.22413793103448276, 0.25862068965517243, 0.3620689655172414, 0.41379310344827586, 0.5, 1.0], [0.0, 0.47619047619047616, 0.5714285714285714, 0.6428571428571429, 0.6428571428571429, 0.7142857142857143, 0.7857142857142857, 0.8095238095238095, 0.8333333333333334, 0.8333333333333334, 1.0], [1.0, 0.9, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0])