I am trying to develop a package which will train multiple models according to the given input data and return the performance metrics (inspired from lazypredict · PyPI). I am using MLJ underneath to train multiple models.
Since I will have to programmatically load multiple model, I am using load() rather than recommended @load macro in MLJ. The following code works fine when I execute from an independent script.
available_algos = models(
m -> matching(X, y)(m) && m.is_pure_julia)
)
# Train models
verbosity = 0
measures = [accuracy, multiclass_f1score]
results = []
for algo_info in available_algos
@info "Training " * algo_info.name
algo = load(algo_info.name, pkg=algo_info.package_name)
machine = algo()
r = evaluate(machine, X, y, resampling=CV(shuffle=true), measures=measures, verbosity=verbosity)
end
Looks like a project that would get a lot of interest.
First, as you probably realize, the load() method you are attempting to use is flagged as “private and experimental”, so perhaps it’s no surprise it does not work.
I’ve never got conditional loading of code from a function to work well in Julia. I can load code from a function using some variation of eval, but it seems you run into World Age issues if you try to use that code before the function returns. This is my experience with the @load macro, in any event.
Whatever the approach you will need to add the packages providing the MLJ models as dependencies in your Project.toml, unless you set up some kind of conditional loading using Requires.jl, which I wouldn’t do in this case.
My suggestion would be to start with an approach in which:
All possible model-providing packages are imported unconditionally
You build a mapping between entries in MLJModels’s metadata to the pre-imported model types, for use in programmatic manipulation of models
Something like this:
module Mining
const API_PACKAGES = [
:MLJLinearModels,
:NearestNeighborModels,
:MLJMultivariateStatsInterface,
:MLJDecisionTreeInterface,
]
for pkg in API_PACKAGES
eval(:(import $pkg))
end
import MLJBase
import MLJModels
# # HELPERS FOR MANIPULATING MODEL METADATA
id(meta) = (meta.name, meta.package_name)
function api_pkg(meta)
path = MLJModels.load_path(meta.name; pkg=meta.package_name)
return split(path, ".") |> first
end
# # GET MAPPING FROM MODEL METADATA TO MODEL TYPE
const METADATA = MLJModels.models() do meta
api_pkg(meta) in string.(API_PACKAGES)
end
const MODEL_TYPE_GIVEN_ID = Dict()
for meta in METADATA
path = MLJModels.load_path(meta.name; pkg=meta.package_name)
type_ex = Meta.parse(path)
MODEL_TYPE_GIVEN_ID[id(meta)] = eval(type_ex)
end
# # FUNCTION TO PROGRAMATICALLY EVALUATE MODELS
function run(meta, data...; kwargs...)
model = MODEL_TYPE_GIVEN_ID[id(meta)]()
e = MLJBase.evaluate(model, data...; kwargs...)
return id(meta) => e.measurement[1]
end
function mine(data...; kwargs...)
metadata = MLJModels.models(MLJModels.matching(data...)) do meta
api_pkg(meta) in string.(API_PACKAGES)
end
return [run(meta, data...; kwargs...) for meta in metadata]
end
end
Then this should work:
using .Mining
using MLJBase
X, y = @load_iris
Mining.mine(X, y; measure=MLJBase.accuracy)
Thanks a lot for your insightful suggestions. This is exactly what I wanted! Also, you observation about WorldAge problems were spot-on. I was running into them with other approaches.