`MLJ.Stack` performance issues

I’m trying to use a stacked model to predict an outcome, but I’m having an unusual performance issue. I’m not sure whether the problems are related to my own code having a mistake somewhere, or if it’s just to be expected, but the model seems to get stuck on fitting the Stacked model, which seems weird to me. I would expect the majority of the computation to be spent on fitting the tree and spline regressors, but instead, it seems to get stuck on trying to calculate the relatively simple linear regressor for stacking the models together:

    tree = EvoTreeRegressor(nrounds=128, nbins=128, eta=0.02)
    spline = EvoSplineRegressor(nrounds=32, eta=0.08, L2=.025)
    knnr = KNNRegressor(K=6, weights=ReciprocalRank())

    cont_kwargs = (
        lower=exp(0), upper=exp(.5), scale=:log
    )

    ranges = [
        mlj.range(tree, :max_depth; values=3:9),
        mlj.range(tree, :lambda; cont_kwargs...),
    ]

    # Tune tree hyperparameters
    tree = mljt.TunedModel(
        model=tree,
        range=ranges,
        tuning=mljt.Grid(),
        resampling=mlj.CV(nfolds=12)
    )

    @views X_obs = data[:, Not([:Name, :LogWeight])]

    # Fit the tuned tree in a machine
    clean = mlj.ContinuousEncoder(drop_last=true) |>
        mlj.Standardizer()
    # clean = mlj.machine(clean, X_obs) |> mlj.fit!
    # stck = mlj.transform(clean, X_obs)
    stck = clean |> mlj.Stack(; metalearner=LinearRegressor(), tree, knnr, spline)
    mach = mlj.machine(stck, X_obs, data.LogWeight) |> mlj.fit!

Just to be clear, this happens even when I use a very small subset (~100 by 10), and the fitting just hangs before crashing. So I don’t think it’s an issue with the dataset being too big.

@ablaom quick question just to be sure on this–does MLJ handle this by tuning the model once, then stacking, or does it try to tune the parameters at the same time as it’s stacking the model? If it’s trying to tune the parameters every time I try to stack the models, that might be the problem.

Fitting a TunedModel instance m implies under-the-hood resampling every time m is trained. Since Stack also resamples, this indeed implies nested resampling, which could be quite slow. I don’t know if this alone explains your “hanging” - I haven’t investigated that.

One strategy in stacking that I’ve seen is this: Rather than tuning a base model, you include a number of those models as base models but with different hyperparameters. The hyperparameters can vary in big steps, the idea being that the adjudicator “interpolates” between them. Alternatively, you could tune the base model separately and extract the best_model for use in the Stack. These strategies might reduce computation time, but they may not generalize as well. It’s very hard to say with stacking, where the improvements over ordinary ensembling are usually slim in practice.

1 Like

I’m actually not sure it does; from what I can tell, the regression seems to just hang and then crashes unpredictably. I think it’s reasonable that this problem might take a long time or error (I wouldn’t be surprised if this is an ill-conditioned problem), but it’s bizarre that it crashes, and that it only does that sometimes. It does this even when I’m avoiding nested evaluations. I’m trying to work out an MWE but haven’t been able to.

I’ve had a play around with this but am unable to reproduce a problem with a small mixed-type dataset. I’ve reduced the number of folds in the tuning to 3. (Using the multithreading option in the Stack speeds things up a lot.) You can see that even with this small data set, training (in single-CPU mode) still takes 3 minutes.

Most of the effort goes into tuning the EvoTreeRegressor; each such tune requires training the model 21 times (the new, lower, number of grid points) on each of 3 folds (cross-validation) - plus we have to multiply that by 3, the default number of folds Stack uses by default for constructing internal out-of-sample predictions for each base model. That’s a total of 189 EvoTreeRegressor trains.

@ParadaCarleton, perhaps you could say more about your data set. What is the output of MLJ.schema(X_obs) and MLJ.rows(X_obs)?

# Status `~/GoogleDrive/Julia/MLJ/MLJ/sandbox/spline/Project.toml`
#   [a93c6f00] DataFrames v1.6.1
#   [ab853011] EvoLinear v0.4.3
#   [f6006082] EvoTrees v0.16.4
#   [add582a8] MLJ v0.20.1
#   [6ee0df7b] MLJLinearModels v0.10.0
#   [636a865e] NearestNeighborModels v0.2.3

using MLJ
import NearestNeighborModels as NNM
import DataFrames as DF

# grab some smallish dataset of mixed feature scitype:
X_obs, y  = unpack(
    load_reduced_ames(),
    in([:OverallQual, :GrLivArea, :Neighborhood]),
    ==(:target),
);
X_obs = DF.DataFrame(X_obs);
# to reduce number of classes in :Neighborhood:
function simplify(nhood)::String
    nhood in ["NoRidge", "NridgHt"] && return nhood
    "Other"
end
X_obs.Neighborhood = simplify.(X_obs.Neighborhood) |> categorical
mach = machine(Standardizer(), y) |> fit!
y = transform(mach, y);

# load all models
EvoTreeRegressor = @load EvoTreeRegressor
EvoSplineRegressor = @load EvoSplineRegressor
KNNRegressor = @load KNNRegressor
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels

# instantiate base models:
tree = EvoTreeRegressor(nrounds=128, nbins=128, eta=0.02)
spline = EvoSplineRegressor(nrounds=32, eta=0.08, L2=.025)
knnr = KNNRegressor(K=6, weights=NNM.ReciprocalRank())

cont_kwargs = (
    lower=exp(0), upper=exp(.5), scale=:log
)

ranges = [
    range(tree, :max_depth; values=3:9),
    range(tree, :lambda; cont_kwargs...),
]

# preprocessing:
clean = ContinuousEncoder(drop_last=true) |>
    Standardizer()

# Tune tree hyperparameters
tree = TunedModel(
    model=tree,
    range=ranges,
    tuning=Grid(goal=20),
    resampling=CV(nfolds=3),
)

# smoke tests:
machine(clean |> tree, X_obs, y) |> fit!
machine(clean |> knnr, X_obs, y) |> fit!
machine(clean |> spline, X_obs, y) |> fit!

# Fit the tuned tree in a machine
# clean = machine(clean, X_obs) |> fit!
# stck = transform(clean, X_obs)
stck = clean |> Stack(; metalearner=LinearRegressor(), tree, knnr, spline)
mach = machine(stck, X_obs, y)

@elapsed fit!(mach, verbosity=3)
# 181.864518363

stck.deterministic_stack.acceleration=CPUThreads()
@elapsed fit!(mach)
# 37.808096486 with 12 threads on my macbook pro

1 Like