Using `load()` from MLJ inside a package

v-i-s-h · January 23, 2023, 4:38pm

Hi,

I am trying to develop a package which will train multiple models according to the given input data and return the performance metrics (inspired from lazypredict · PyPI). I am using MLJ underneath to train multiple models.

Since I will have to programmatically load multiple model, I am using load() rather than recommended @load macro in MLJ. The following code works fine when I execute from an independent script.

available_algos = models(
    m -> matching(X, y)(m) && m.is_pure_julia)
)

# Train models
verbosity = 0
measures = [accuracy, multiclass_f1score]
results = []
for algo_info in available_algos
    @info "Training " * algo_info.name
    algo = load(algo_info.name, pkg=algo_info.package_name)
    machine = algo()

    r = evaluate(machine, X, y, resampling=CV(shuffle=true), measures=measures, verbosity=verbosity)
end

However, when I try to pack this as a module in my package ModelMiner.jl (https://github.com/v-i-s-h/ModelMiner.jl), it throws this error.

julia> a = mine(X, y)
[ Info: Training AdaBoostStumpClassifier -- DecisionTree
ERROR: LoadError: UndefVarError: @load not defined
Stacktrace:
 [1] top-level scope
   @ :0
 [2] eval
   @ ./boot.jl:368 [inlined]
 [3] eval(x::Expr)
   @ Base.MainInclude ./client.jl:478
 [4] load(name::String; pkg::String, add::Bool, verbosity::Int64, mod::Module)
   @ MLJModels ~/.julia/packages/MLJModels/8Nrhi/src/loading.jl:229
 [5] mine(X::DataFrame, y::CategoricalArrays.CategoricalVector{String, UInt8, String, CategoricalArrays.CategoricalValue{String, UInt8}, Union{}})
   @ ModelMiner ~/.julia/dev/ModelMiner/src/ModelMiner.jl:20
 [6] top-level scope
   @ REPL[9]:1
in expression starting at [redacted]/.julia/packages/MLJModels/8Nrhi/src/loading.jl:227

It happens when I try to use load() function (https://github.com/v-i-s-h/ModelMiner.jl/blob/7c3a34c91b245a7baa7d3bc70d00f419b430a89f/src/ModelMiner.jl#L20).

The issue seems to be coming from mod kwarg from MLJModels.jl/loading.jl at 81ba4a82ee6f9896ebd034931266bf38541fdd74 · JuliaAI/MLJModels.jl · GitHub which evaluate the expression in Main module. I tried to call load() with mod=@__MODULE__, but still the same error pops up.

The issue can be reproduced if you run the test https://github.com/v-i-s-h/ModelMiner.jl/blob/7c3a34c91b245a7baa7d3bc70d00f419b430a89f/test/runtests.jl

How can I fix this?

ablaom · January 23, 2023, 11:58pm

Looks like a project that would get a lot of interest.

First, as you probably realize, the load() method you are attempting to use is flagged as “private and experimental”, so perhaps it’s no surprise it does not work.

I’ve never got conditional loading of code from a function to work well in Julia. I can load code from a function using some variation of eval, but it seems you run into World Age issues if you try to use that code before the function returns. This is my experience with the @load macro, in any event.

Whatever the approach you will need to add the packages providing the MLJ models as dependencies in your Project.toml, unless you set up some kind of conditional loading using Requires.jl, which I wouldn’t do in this case.

My suggestion would be to start with an approach in which:

All possible model-providing packages are imported unconditionally
You build a mapping between entries in MLJModels’s metadata to the pre-imported model types, for use in programmatic manipulation of models

Something like this:

module Mining

const API_PACKAGES = [
    :MLJLinearModels,
    :NearestNeighborModels,
    :MLJMultivariateStatsInterface,
    :MLJDecisionTreeInterface,
]

for pkg in API_PACKAGES
    eval(:(import $pkg))
end
import MLJBase
import MLJModels


# # HELPERS FOR MANIPULATING MODEL METADATA

id(meta) = (meta.name, meta.package_name)

function api_pkg(meta)
    path = MLJModels.load_path(meta.name; pkg=meta.package_name)
    return split(path, ".") |> first
end


# # GET MAPPING FROM MODEL METADATA TO MODEL TYPE

const METADATA = MLJModels.models() do meta
    api_pkg(meta) in string.(API_PACKAGES)
end

const MODEL_TYPE_GIVEN_ID = Dict()

for meta in METADATA
    path = MLJModels.load_path(meta.name; pkg=meta.package_name)
    type_ex = Meta.parse(path)
    MODEL_TYPE_GIVEN_ID[id(meta)] = eval(type_ex)
end


# # FUNCTION TO PROGRAMATICALLY EVALUATE MODELS

function run(meta, data...; kwargs...)
    model  = MODEL_TYPE_GIVEN_ID[id(meta)]()
    e = MLJBase.evaluate(model, data...; kwargs...)
    return id(meta) => e.measurement[1]
end

function mine(data...; kwargs...)
    metadata = MLJModels.models(MLJModels.matching(data...)) do meta
        api_pkg(meta) in string.(API_PACKAGES)
    end
    return [run(meta, data...; kwargs...) for meta in metadata]
end

end

Then this should work:

using .Mining
using MLJBase

X, y = @load_iris

Mining.mine(X, y; measure=MLJBase.accuracy)

ablaom · January 24, 2023, 12:00am

Incidentally, in case you missed it, the MLJ manual has a section on model loading that might be helpful: Loading Model Code · MLJ

ablaom · January 24, 2023, 12:08am

There is also this related GitHub issue.

Also, @azev77, who has previously played around programmatic model comparisons in MLJ, may want to comment.

v-i-s-h · January 24, 2023, 6:24pm

Hi @ablaom ,

Thanks a lot for your insightful suggestions. This is exactly what I wanted! Also, you observation about WorldAge problems were spot-on. I was running into them with other approaches.

Thanks,
Vishnu Raj

Topic		Replies	Views
Macro interpolation and MLJ.jl New to Julia	1	261	August 20, 2020
MLJ unpack example from MLJ getting started not working New to Julia mlj	7	278	June 21, 2023
Predict not defined General Usage question , package	3	701	March 2, 2022
Question about MLJ’s model General Usage question , mlj	8	743	January 4, 2022
Automate training MLJ models Machine Learning machine-learning , mlj	14	2108	February 17, 2020

Using `load()` from MLJ inside a package

Related topics