Automate training MLJ models

I’d like to train 3 models in MLJ: ARDRegressor, AdaBoostRegressor, BaggingRegressor

Currently, I train them 1 at a time for example:

using Pkg; Pkg.activate("."); Pkg.instantiate();
using RDatasets, MLJ, Statistics, PrettyPrinting, GLM

X, y =  @load_boston; train, test = 1:406, 407:506 

reg = @load ARDRegressor;
m = machine(reg, X, y);
fit!(m, rows=train);
ŷ = predict(m, rows=test)
os_ARDRegressor = rms(ŷ , y[test])

I’d like to train them w/ a loop w/ model names stored in “Models” such as:

Models=[ARDRegressor, AdaBoostRegressor, BaggingRegressor]
for jj in eachindex(Models)
   reg = @load jj;
   m = machine(reg, X, y);
   fit!(m, rows=train);
   ŷ = predict(m, rows=test)
   os_jj = rms(ŷ , y[test])
end
1 Like

Not so sure I understand the problem, granted I don’t know MLJ well.

In my package I use loops like this all the time. I just make a list of the model functions, and call them.

err = []
for model in [model1, model2, model3]
    preds = model1(data)
    push!( err, RMSE(preds, groundtruth) )
end

Generally I like to use a dictionary with the model name, so I can track stats without having to reference back to an object, or save things off to TeX.

Maybe this pattern will help?
https://github.com/caseykneale/ChemometricsTools.jl/blob/master/shootouts/ClassificationShootout.jl
Unless I am missing the point and there’s some sort of issue with MLJ?

2 Likes

Thank you @anon92994695. My goal is to
1: store a large set of model names in modlist
2: use a single loop over modlist to train all models & record the scores.

Currently, I have to do:

modlist = [
@load ARDRegressor pkg= ScikitLearn;
@load AdaBoostRegressor pkg= ScikitLearn;
@load BaggingRegressor pkg= ScikitLearn]

score = [ ]
for (i, mod) in enumerate(modlist)
reg = mod;
m = machine(reg, X, y);
fit!(m, rows=train);
ŷ = predict(m, rows=test)
push!( score, (i, mod, rms(ŷ , y[test])) )
end

I would like:

modlist = [
ARDRegressor ;
AdaBoostRegressor ;
BaggingRegressor] ;
score =
for (i, mod) in enumerate(modlist)
@load mod;
reg = mod;
m = machine(reg(), X, y);
fit!(m, rows=train);
ŷ = predict(m, rows=test)
push!( score, (i, mod, rms(ŷ , y[test])) )
end

I apologise for seeing this so late, here’s one way you could do this

using MLJ

X, y =  @load_boston
train, test = 1:406, 407:506

list = [
    "ScikitLearn"     => ["ARDRegressor", "AdaBoostRegressor", "BaggingRegressor"],
    "MLJLinearModels" => ["QuantileRegressor"]
    ]

# Load everything
function load_everything(list)
    all_models = String[]
    for (pkg, models) in list
        for m in models
            load(m, pkg=pkg)
            push!(all_models, m)
        end
    end
    return all_models
end

function score_model(m::String, X, y, train, test)
    mdl  = eval(Meta.parse("$(m)()"))
    mach = machine(mdl, X, y)
    fit!(mach, rows=train)
    ŷ = predict(mach, rows=test)
    return rms(ŷ, y[test])
end

all_models = load_everything(list)

all_scores = [score_model(m, X, y, train, test) for m in all_models]

giving something like

4-element Array{Float64,1}:
 5.713015246458931
 4.587126755456758
 4.294341625907283
 4.750778077429626

PS: we want to implement clean ways of comparing an arbitrary number of models for however many metrics you may care about, this is not yet available but hopefully soon will be.

Note also that here the bit that’s a bit awkward is to just pass names of not-yet-loaded models, I understand what you’re trying to do but I don’t think that’s something we want to support fully as it encourages people to just use default hyper parameters for everything which might not be appropriate at all.

What makes more sense to support is to have a bunch of pre-defined models and compare them:

models = [
    ARDRegressor(n_iter=10),
    QuantileRegressor(lambda=0.5)
]

function score_model2(m, X, y, train, test)
    mach = machine(m, X, y)
    fit!(mach, rows=train)
    ŷ = predict(mach, rows=test)
    return rms(ŷ, y[test])
end

all_scores = [score_model2(m, X, y, train, test) for m in models]

then of course you might want to make sure that each of them has its own hyperparameters tuned optimally using CV or otherwise, that’s the kind of benchmarking we’d like to support in the future.

2 Likes

Thank you!
I’m new to Julia & MLJ.
My goal is to create a program where for round 1 I can train a large number of different models w/

1 Tuning: default hyperparameters (or automatic tuning when MLJ has it ready)
2 Feature engineering: raw features
3 Target engineering: raw targets
4 Ensembling: none unless raw model is ensemble (eg RF)

Then, compare the scores from all the raw models in round 1, and based on this proceed to feature/target engineering, tuning, and ensembling.

Then write a blog post to tell the world about the power/flexibility/beauty of ML in Julia.

Question: above you created an elegant list of models in list = .
Is it possible to automatically make this kind of list for all regression models which work with predict()?
I realize models(matching(X, y)) can help, but it doesn’t put it in this form

2 Likes

Sounds good, I’m happy to help you with the blog post if you’d like and eventually, if you agree, I can add it to the MLJTutorials repo (attributed to you of course).

Question: above you created an elegant list of models in list = .
Is it possible to automatically make this kind of list for all regression models which work with predict()?
I realize models(matching(X, y)) can help, but it doesn’t put it in this form

There are many ways you could do this and I won’t claim the one below is the best, it’s just a continuation of what I typed earlier which should be reasonably accessible to someone new to Julia/MLJ.

I’ve added additional explanations if you wanted to change the dataset etc, of things that you’d have to do.

Code
using MLJ, Random

# Get the data
X, y =  @load_boston

# Reproducible train-test split
train, test = partition(eachindex(y), .7, rng=333)

# Inspect the current scientific types of the features

schema(X)

# In case you want anything to  be interpreted differently,
# use `coerce`. For instance if you have a column of strings
# but you want to consider that column as a categorical feature
# you should do `X = coerce(X, :column_name => Multiclass)`
#
# If you have to re-interpret a few columns, you can pass a series of
# pairs like `X = coerce(X, :col1 => Continuous, :col2 => Multiclass)`
#
# If you have many columns to re-interpret, use `autotype` with
# rules (see docs of ScientificTypes.jl)
#
# In the case of `Boston`, everything is Continuous which is
# the easiest setting.
#
# Once the data is appropriately interpreted, you can check
# what models are appropriate

matching_models = models(matching(X, y))

# That's a NamedTuple where each entry has a field
# `name`, `package_name` and some metadata.

matching_models[1]

# Let's say we want to filter only the models that come from
# ScikitLearn and DecisionTree

filter!(m -> m.package_name ∈ ("ScikitLearn", "DecisionTree") , matching_models)

# Let's write a function to load all these models in one shot
# Note that if you expect some models to be present multiple
# times if they are offered by several packages (e.g. LinearRegressor
# is offered by multiple packages) then you'd have to add a few lines
# to distinguish between them.

function load_everything(model_list)
    model_names = Vector{String}(undef, length(model_list))
    for (i, model) in enumerate(model_list)
        load(model.name, pkg=model.package_name)
        model_names[i] = model.name
    end
    return model_names
end

# Let's load the first ten matching models and keep track
# of their names

model_names = load_everything(matching_models[1:10])

# Cool now we can use the rest of the code easily

function score_model(m::String, X, y, train, test)
    mdl  = eval(Meta.parse("$(m)()"))
    mach = machine(mdl, X, y)
    fit!(mach, rows=train)
    ŷ = predict(mach, rows=test)
    return rms(ŷ, y[test])
end

all_scores = [score_model(m, X, y, train, test) for m in model_names]

This gives

Results
julia> all_scores
10-element Array{Float64,1}:
  4.452016691003462 
  3.6085192568566247
  2.626079727815229 
  4.58592727589615  
  3.5631972786734782
  9.241962189784388 
  5.096452157628015 
  4.999085608620905 
  2.4917456032500183
 24.034358683155105

With the GPRegressor being shit as expected and the ExtraTreeRegressor being awesome (as is often the case though not always)

Now, as I explained earlier, doing the hyperparameter tuning automatically for each model is not trivial, you’d have to specify which hyperparameters you want to tune for each model. If the answer is “all of them”, then the space is very large, of course you could parallelise the training for each model but even then.
In principle, it can be done, but that requires more advanced automatic HP tuning than we support at the moment.

Finally, as I said earlier, we want to support standardised model comparison / benchmarking in the future but this is still WIP.

I hope that partly answers your questions

3 Likes

I don’t see why it would be hard to tune hyperparameters? I was gonna say I thought that was a big part of what you all were doing. If you want I can probably write an easy way to do this for the usecase most people who do this sort of thing do: flat CV’s. In my package I just broke out CV’s for the end user:
ex1. Regression · ChemometricsTools
ext2. SIPLS · ChemometricsTools

not sure why there is metaprogramming or a machine function call though? So whatever I’d write would be more generic then the MLJ framework… Most any evaluation I can think of is a function call?

edit - I see machine is the object you all are using to store the models? Why not just make a struct for the model? Then make the inference function be a call of that struct? Thats what I did and its pretty clean. I think you can define functions on abstract types now since julia 1.2 as well. So I’d consider that.

Then if someone wants to add a model to your package all they have to do is make a struct, constructor, and a predict function… Thats how I prototype in my package anyways…

For nested CV’s yea I don’t believe there’s a super duper easy way, that’s kind of the territory. I asked for some ideas on the metaproblem of this a while back and didn’t get replies. But - it is solvable, I just don’t know if I’d trust the average user to do it correctly in almost any cookbook framework.

I don’t see why it would be hard to tune hyperparameters? I was gonna say I thought that was a big part of what you all were doing. If you want I can probably write an easy way to do this for the usecase most people who do this sort of thing do: flat CV’s.

Hard in the sense that if you have 5 models, and all 5 have 5 hyperparameters, then that’s 25 hyperparameters to train. You can parallelise that training to 5x5 hyperparameters to train but even then, tuning 5 hyperparameters potentially means a large volume to explore, yes you can do random search with CV etc etc, but it’s still a large volume.

Some of the models have many hyperparameters, this goes beyond just what kind of penalty to use and can encompass such things as “what solver to use”, “what metric to use”, “when to prune” etc etc, for instance DecisionTrees or XGBoost have a ton of hyperparameters. Typically a user will only want to tune a few of these hyperparameters (say 3-4) irrelevant of how they tune them.

So what I was saying is that if you compare 50 models, each with however many hyperparams they have, that’s a lot of work potentially.

So what you could do instead is: pass a list of models and which of their hyperparameters to tune. This is not hard to do but it goes beyond just finding models that are adequate for the scientific type via models(matching(....)) and trying to train all of those.

edit - I see machine is the object you all are using to store the models? Why not just make a struct for the model? Then make the inference function be a call of that struct?

The design choices of MLJ will be discussed in a short article that Anthony has the intention to write soon.
In the mean time, if you look at the docs, you will see that models are indeed a struct and that machines contain the model and the results of the fitting of the model. Bear in mind that much of the design decisions were made with composability in mind and the notion of learning networks. See also the docs.

2 Likes

Guess I’m just confused. All machine learning models from the perspective of inference naturally compose? They take 1 input type(some tensor), apply stored elements in some way to it, and output 1 output type(some tensor)? I suppose at training you may want to pile a model onto another? In it’s simplest form that’s a matter of returning an output and the stepwise model struct. Most people, I don’t think, would want their data pipeline to be considered directly linked to their model chain until production anyways - so that’s probably not what you’re all concerned with.

Looking forward to the doc, because I just can’t follow where you all are going despite trying and having some experience in both using these tools and writing them in academia and industry… :man_shrugging: Can’t contribute even suggestions until then. Might want to add docstrings to the functions if you intend for anyone else to chip in.

If you want to help, the best you can do is go through the MLJ Tutorials and ask questions on that repo. That should allow you to get a grasp of how things are done in MLJ, and if you have suggestions to improve the tutorials or add functionalities to MLJ to make it more user friendly, that will be very welcome there.

If you would like to comment on MLJ’s design decision, I’d recommend you read the docs first, they are pretty complete.

1 Like

So far the most informative thing to me has been MLJBase’s code: https://github.com/alan-turing-institute/MLJBase.jl/blob/ce614d567722a8fe33c1617b8e4790999e923c83/src/networks.jl

@tlienart To clarify does each model in MLJ have default hyper-parameters or a default grid?
For example elastic net has alpha (L2 vs L1 norm) and lambda (total penalty).
Is there a single default value for each: alpha = [.5] and lambda = [10]?
Or is there a default grid: alpha = [0, 0.5, 1] and lambda = [0, 10, 100] ?

For *CV* models provided by sklearn some hyperparameters have a default grid (same as the corresponding model in sklearn).

Otherwise hyperparameters mostly have single-values as default

1 Like

@tlienart @ablaom
My understanding is that MLJ now can automate basic grid search including two options:
Grid(goal=45): finds a uniform grid w/ a total of 45 values of the HP
Grid(resolution=10): has 10 value per dimension (can be big if there are 8 hyper params)

Question: how can I automatically tune a set of models with Grid(goal=45) WITHOUT manually specifying the ranges for the hyper-parameters for each model?

For example, how can we do this in the above examples (Boston housing) for [“ARDRegressor”, “AdaBoostRegressor”, “BaggingRegressor”] or even better, all regression models.

(Eg in a standard linear regression, the only hyper-parameter may be intercept in {True, False}, in which case there are two possible values < Grid(goal=45), in these cases I would hope MLJ only runs two regressions & not 45)

From the discussion at Improve the tuning strategy interface · Issue #315 · alan-turing-institute/MLJ.jl · GitHub

There is currently no way to tune a hyperparameter without specififying a range for that parameter. (To be clear, specifying a one-dimensional range does not mean specifying a one-dimensional grid. It means specifying an upper and lower bound and a scale - and bit more if the range is unbounded.)

I have proposed that in the future it be possible to specify only the name of the hyper parameter, in which case a default range is used. Such functionality depends on recording in the model registry default ranges for these parameters. There is already the facility for recording this information (the model interface author implements the hyperparameter_ranges trait) but this has not been done in the case of a single model.

2 Likes