I apologise for seeing this so late, here’s one way you could do this
using MLJ
X, y = @load_boston
train, test = 1:406, 407:506
list = [
"ScikitLearn" => ["ARDRegressor", "AdaBoostRegressor", "BaggingRegressor"],
"MLJLinearModels" => ["QuantileRegressor"]
]
# Load everything
function load_everything(list)
all_models = String[]
for (pkg, models) in list
for m in models
load(m, pkg=pkg)
push!(all_models, m)
end
end
return all_models
end
function score_model(m::String, X, y, train, test)
mdl = eval(Meta.parse("$(m)()"))
mach = machine(mdl, X, y)
fit!(mach, rows=train)
ŷ = predict(mach, rows=test)
return rms(ŷ, y[test])
end
all_models = load_everything(list)
all_scores = [score_model(m, X, y, train, test) for m in all_models]
giving something like
4-element Array{Float64,1}:
5.713015246458931
4.587126755456758
4.294341625907283
4.750778077429626
PS: we want to implement clean ways of comparing an arbitrary number of models for however many metrics you may care about, this is not yet available but hopefully soon will be.
Note also that here the bit that’s a bit awkward is to just pass names of not-yet-loaded models, I understand what you’re trying to do but I don’t think that’s something we want to support fully as it encourages people to just use default hyper parameters for everything which might not be appropriate at all.
What makes more sense to support is to have a bunch of pre-defined models and compare them:
models = [
ARDRegressor(n_iter=10),
QuantileRegressor(lambda=0.5)
]
function score_model2(m, X, y, train, test)
mach = machine(m, X, y)
fit!(mach, rows=train)
ŷ = predict(mach, rows=test)
return rms(ŷ, y[test])
end
all_scores = [score_model2(m, X, y, train, test) for m in models]
then of course you might want to make sure that each of them has its own hyperparameters tuned optimally using CV or otherwise, that’s the kind of benchmarking we’d like to support in the future.