Example of the use DecisionTree.permutation_importance function based on MLJ

lgmendes · December 11, 2022, 1:27am

Can someone give me a small example of how to use the function DecisionTree.permutation_importance()
using an MLJ Tree machine (Tree = @load DecisionTreeClassifier pkg=DecisionTree)?
A Python similar function is sklearn.inspection.permutation_importance — scikit-learn 1.2.0 documentation

ablaom · December 12, 2022, 8:30pm

cc @samuel_okon

samuel_okon · December 13, 2022, 1:57am

@lgmendes unfortunately, there isn’t a standard way to do this with MLJ machines yet.
This wasn’t directly exposed from the DecisionTree package, because it’s a method that could be applied to any other predictive model, not just DecisionTree classifiers or DecisionTree regressors. Hence, there are plans to add this functionality for all supervised machines.
For now, if you can use the following code do evaluate permutation importance. You may wish to replace f1score with any other appropriate score from MLJ measures.

julia> using MLJ, DecisionTree

julia> X, y = @load_crabs
((FL = [8.1, 8.8, 9.2, 9.6, 9.8, 10.8, 11.1, 11.6, 11.8, 11.8  …  20.3, 20.5, 20.6, 20.9, 21.3, 21.4, 21.7, 21.9, 22.5, 23.1], RW = [6.7, 7.7, 7.8, 7.9, 8.0, 9.0, 9.9, 9.1, 9.6, 10.5  …  16.0, 17.5, 17.5, 16.5, 18.4, 18.0, 17.1, 17.2, 17.2, 20.2], CL = [16.1, 18.1, 19.0, 20.1, 20.3, 23.0, 23.8, 24.5, 24.2, 25.2  …  39.4, 40.0, 41.5, 39.9, 43.8, 41.2, 41.7, 42.6, 43.0, 46.2], CW = [19.0, 20.8, 22.4, 23.1, 23.0, 26.5, 27.1, 28.4, 27.8, 29.3  …  44.1, 45.5, 46.2, 44.7, 48.4, 46.2, 47.2, 47.4, 48.7, 52.5], BD = [7.0, 7.4, 7.7, 8.2, 8.2, 9.8, 9.8, 10.4, 9.7, 10.3  …  18.0, 19.2, 19.2, 17.5, 20.0, 18.7, 19.6, 19.5, 19.8, 21.1]), CategoricalArrays.CategoricalValue{String, UInt32}["B", "B", "B", "B", "B", "B", "B", "B", "B", "B"  …  "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"])

julia> Tree = @load DecisionTreeClassifier pkg=DecisionTree
[ Info: For silent loading, specify `verbosity=0`.
import MLJDecisionTreeInterface ✔
MLJDecisionTreeInterface.DecisionTreeClassifier

julia> mach = machine(Tree(), X, y) |> MLJ.fit!
[ Info: Training machine(DecisionTreeClassifier(max_depth = -1, …), …).
trained Machine; caches model-specific representations of data
  model: DecisionTreeClassifier(max_depth = -1, …)
  args:
    1:  Source @900 ⏎ Table{AbstractVector{Continuous}}
    2:  Source @581 ⏎ AbstractVector{Multiclass{2}}


julia> fitted_tree = fitted_params(mach).tree
Decision Tree
Leaves: 17
Depth:  9

julia> class_list = MLJ.int(classes(y))
2-element Vector{UInt32}:
 0x00000001
 0x00000002

julia> function wrapped_f1score(tree, ylabels_val, X_val_matrix)
       y_pred = categorical(DecisionTree.apply_tree(tree, X_val_matrix), levels=class_list, ordered=true)
       return MLJ.f1score(y_pred, categorical(ylabels_val, levels=class_list, ordered=true))
       end
wrapped_f1score (generic function with 1 method)

julia> DecisionTree.permutation_importance(fitted_tree, MLJ.int(y), MLJ.matrix(X), wrapped_f1score)
(mean = [0.4663372650034014, 0.0, 0.02936705464649414, 0.32679053711036654, 0.2246004559048037],
 std = [0.024054791382060164, 0.0, 0.008290294746685429, 0.017755155428814066, 0.018309476843891812],
 scores = [0.4607843137254902 0.4926829268292683 0.4455445544554455; 0.0 0.0 0.0; … ; 0.31999999999999995 0.31343283582089554 0.34693877551020413; 0.20346320346320346 0.23555555555555552 0.23478260869565215],)

For more info see ?DecisionTree.permutation_importance

Topic		Replies	Views
ML feature importance in julia General Usage question	15	4868	July 21, 2020
How to visualise the structure of the decision tree built by MLJ? Machine Learning	9	3486	March 27, 2022
Problem using MLJ and Multiclass scitypes Machine Learning	4	915	November 2, 2020
Applying Machine learning models to Diabetes dataset using MLJ.jl packages General Usage	2	555	July 4, 2021
Predict not defined General Usage question , package	3	706	March 2, 2022

Example of the use DecisionTree.permutation_importance function based on MLJ

Related topics