RandomizedSearchCV : specify scoring metrics other than default

question

#1

I am using RandomizedsearchCV to fine tune hyperparameters. I think the default scoring is based on “accuracy”.

  1. However I wish to specify a different scoring function such as “recall”. I have been able to use it successfully in Python but any attempt to specify a scoring parameter results in an error in Julia. see below.

  2. Moreover everytime I run the model I get a different set of best parameters. I have specified the seed value to return the same parameters but still getting different set of best parameters everytime.

using Distributed,DecisionTree
using StatsBase,ScikitLearn
rmprocs(procs())
c=addprocs(3)
@everywhere using DecisionTree,ScikitLearn,StatsBase
using ScikitLearn.GridSearch: GridSearchCV, RandomizedSearchCV
using Printf,Statistics

const n_iters = 10
sampler(a)=StatsBase.sample(a,n_iters)

param_dist = Dict("pruning_purity_threshold"=> sampler(0.5:0.001:1.0),
                  "max_depth"=> sampler(4:50),  
                  "min_samples_leaf"=> sampler(1:11),
                  "min_samples_split"=> sampler(2:22));#,
#                   "min_purity_increase" =>[0.0001]);

# build a classifier
Random.seed!(123);
clf = DecisionTree.DecisionTreeClassifier()

# run randomized search
nfold=5
random_search = RandomizedSearchCV(clf, param_dist, n_iter=n_iters, random_state=MersenneTwister(123),cv=nfold);
fit!(random_search, X, y)
---------------------------------------
random_search2 = RandomizedSearchCV(clf, param_dist, n_iter=n_iters, random_state=MersenneTwister(123),cv=nfold,
    scoring="accuracy_score");
fit!(random_search2, X, y)
UndefVarError: sorted not defined

Stacktrace:
 [1] get_scorer(::Symbol) at C:\Users\chatura\.julia\packages\ScikitLearn\HK6Vs\src\scorer.jl:64
 [2] get_scorer at C:\Users\chatura\.julia\packages\ScikitLearn\HK6Vs\src\scorer.jl:55 [inlined]
 [3] #check_scoring#82 at C:\Users\chatura\.julia\packages\ScikitLearn\HK6Vs\src\cross_validation.jl:435 [inlined]
 [4] check_scoring(::DecisionTreeClassifier, ::String) at C:\Users\chatura\.julia\packages\ScikitLearn\HK6Vs\src\cross_validation.jl:432
 [5] _fit!(::RandomizedSearchCV, ::Array{Float64,2}, ::Array{Int64,1}, ::Array{Any,1}) at C:\Users\chatura\.julia\packages\ScikitLearn\HK6Vs\src\grid_search.jl:258
 [6] fit!(::RandomizedSearchCV, ::Array{Float64,2}, ::Array{Int64,1}) at C:\Users\chatura\.julia\packages\ScikitLearn\HK6Vs\src\grid_search.jl:748
 [7] top-level scope at In[68]:3rd_place_medal:

#2

It might be beneficial to include info about which package is being used. This question is about scikitLearn, but that’s not written anywhere? Also, the code can not be run without using this package first.


#3

Apologies. I have updated my previous post with all the packages I am using.


#4

I have been able to solve both the issues by doing a lit bit of experimentation and googling.

  1. For specifying a different scoring function, I used the following code and then specified the scoring parameter in RandomizedSearchCV.
using ScikitLearn: @sk_import
@sk_import metrics: recall_score
scorer = ScikitLearn.Skcore.make_scorer(recall_score)

random_search2 = RandomizedSearchCV(clf, param_dist, n_iter=n_iters, random_state=MersenneTwister(123),cv=nfold,
    scoring=scorer);
  1. For reproducibility of results, I specified the random seed in custom sampler.
sampler(a)=StatsBase.sample(MersenneTwister(123),a,n_iters)

using these two modificaitons, the issue raised by me has been resolved.