How do I tune a pipeline in MLJ?

A slack user has asked the question in the title. More generally, how does one tune hyperparameters that are nested in composed model?

One obtains composed models, for example, when applying Stack, EnsembleModel, IteratedModel, BinaryThresholdPredictor, BalancedModel, TunedModel and other model “wrappers”.

1 Like

Here’s an example addressing the question:

Pkg.activate(temp=true)
Pkg.add(["MLJ", "MLJXGBoostInterface"])
using MLJ

X, y = @load_reduced_ames;

# notice `X` has mixed feature types:
schema(X)

# vertically split data:
(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, multi=true)

XGBoostRegressor = @load XGBoostRegressor
pipe = ContinuousEncoder() |> XGBoostRegressor()

propertynames(pipe)
# (:continuous_encoder, :xg_boost_regressor, :cache)

# range for a nested hyperparameter:
r = range(pipe, :(xg_boost_regressor.max_depth), lower=3, upper=10)

# self-tuning pipeline:
tmodel = TunedModel(
    pipe,
    resampling=CV(nfolds=5),
    tuning=Grid(resolution=10),
    measure = l2,
    range=r,
)

# training:
mach = machine(tmodel, Xtrain, ytrain) |> fit!

# inspect optimal parameter:
best_pipe = report(mach).best_model
best_pipe.xg_boost_regressor.max_depth
# 5

# predict:
predict(mach, Xtest)