Automatic Creation of a Grid of Tuning Parameters

Rahul · November 22, 2021, 4:40pm

Currently, in R’s caret package if we do something like this:

fitControl <- trainControl(method = "repeatedcv", number=10,repeats=5)
model_rf = train(responder~ ., data=trainData, method='rf',trControl=fitControl)

a grid of tuning parameters is automatically created.
Can something like this be done using MLJ.jl ?

ablaom · November 23, 2021, 6:46pm

Sure, MLJ provides a range of tuning strategies including Grid. For repeated (aka Monte Carlo) cross validation, give TunedModel the options resampling=CV(nfolds=10, rng=123) and repeats=5. For other options query the TunedModel docstring.

Note that in MLJ tuning is implemented as a model wrapper, as in MLR/MLR3. The wrapped model can be viewed as a “self-tuning” version of the original model. Under the hood the provided resampling strategy (eg, CV) is applied to determine the optimal hyperparameter(s) and then the atomic model is retrained with these parameters using all the data.

Tuning docs
A tuning tutorial
Another tuning tutorial
One of several end-to-end examples with tuning

Rahul · November 23, 2021, 8:22pm

Thank you for your response @ablaom

I should have mentioned my problem a little more clearly. What I am looking for is as follows:

In R, as mentioned in the main question, a range of parameters is automatically created by running the 2 lines of code. To be more specific, for random forest classification in R, a grid of values for n_subfeatures (or mtry in R) is automatically created by caret.

In MLJ, if I want to tune a RandomForest, I will have to do something like:

RandomForestClassifier = @load RandomForestClassifier pkg=DecisionTree
rf_model = RandomForestClassifier()

range_rf = range(rf_model, :n_subfeatures, values=[2,6,10])
self_tuning_rf = TunedModel(model=rf_model, resampling=CV(nfolds=10), 
repeats=5, tuning=Grid(), range=range_rf, measure=[accuracy, kappa])

rf = machine(self_tuning_rf, X, y)

MLJ.fit!(rf, rows=train)

The problem here is the the TunedModel function expects a range. If I run without a range, I will get the following error:

julia> self_tuning_svm = TunedModel(model=rf_model, resampling=CV(nfolds=10), 
repeats=5, tuning=Grid(), measure=accuracy)
ERROR: LoadError: ArgumentError: You need to specify `range=...`, unless `tuning=Explicit` and and `models=...` is specified instead. 
Stacktrace:
...

All in all, what I want is some sort of implementation where I can run the TunedModel function without passing anything into the range argument and it automatically choses one or two or more parameters to tune depending on the model (like caret chooses mtry for random forest, cp for decision tree) and creates a grid based on the type of problem (probabilisitc) and the dataset (number of features, number of rows, data schema, etc.) that is passed like caret does. Hope I am clear.

ablaom · November 23, 2021, 8:59pm

Ah, thanks for clarifying. Yes, my understanding is that caret stores in its model metadata default ranges for each hyper-parameter. MLJ does not yet provide this cool feature.

We had thought about this but my current inclination would be to provide instead default prior probability distributions, as I think RandomSearch is a better all-purpose strategy that Grid. There is an OpenML project which has been determining good default priors for popular models by “learning” these priors using a battery of OpenML datasets. Do you have thoughts on this suggestion?

Rahul · November 23, 2021, 9:10pm

I agree. Also, I think we should inculcate some information such as number of features, type of prediction problem, etc. of the training dataset into the priors.

Topic		Replies	Views
Best ways to do hyper-parameter tuning Machine Learning mlj , tuning	3	2799	January 6, 2020
Automate training MLJ models Machine Learning machine-learning , mlj	14	2115	February 17, 2020
Question Regarding leveraging MLJ.jl's CV features for my own Machine Learning	5	494	November 25, 2019
How do I tune a pipeline in MLJ? Machine Learning optimization , mlj	1	178	March 4, 2024
[ANN] BetaML v0.8: Model defininition, hyperparameters tuning and fitting in 2 lines Machine Learning package , announcement , machine-learning	4	500	October 3, 2022

Automatic Creation of a Grid of Tuning Parameters

Related topics