Using `load()` from MLJ inside a package

Hi,

I am trying to develop a package which will train multiple models according to the given input data and return the performance metrics (inspired from lazypredict · PyPI). I am using MLJ underneath to train multiple models.

Since I will have to programmatically load multiple model, I am using load() rather than recommended @load macro in MLJ. The following code works fine when I execute from an independent script.

available_algos = models(
    m -> matching(X, y)(m) && m.is_pure_julia)
)

# Train models
verbosity = 0
measures = [accuracy, multiclass_f1score]
results = []
for algo_info in available_algos
    @info "Training " * algo_info.name
    algo = load(algo_info.name, pkg=algo_info.package_name)
    machine = algo()

    r = evaluate(machine, X, y, resampling=CV(shuffle=true), measures=measures, verbosity=verbosity)
end

However, when I try to pack this as a module in my package ModelMiner.jl (https://github.com/v-i-s-h/ModelMiner.jl), it throws this error.

julia> a = mine(X, y)
[ Info: Training AdaBoostStumpClassifier -- DecisionTree
ERROR: LoadError: UndefVarError: @load not defined
Stacktrace:
 [1] top-level scope
   @ :0
 [2] eval
   @ ./boot.jl:368 [inlined]
 [3] eval(x::Expr)
   @ Base.MainInclude ./client.jl:478
 [4] load(name::String; pkg::String, add::Bool, verbosity::Int64, mod::Module)
   @ MLJModels ~/.julia/packages/MLJModels/8Nrhi/src/loading.jl:229
 [5] mine(X::DataFrame, y::CategoricalArrays.CategoricalVector{String, UInt8, String, CategoricalArrays.CategoricalValue{String, UInt8}, Union{}})
   @ ModelMiner ~/.julia/dev/ModelMiner/src/ModelMiner.jl:20
 [6] top-level scope
   @ REPL[9]:1
in expression starting at [redacted]/.julia/packages/MLJModels/8Nrhi/src/loading.jl:227

It happens when I try to use load() function (https://github.com/v-i-s-h/ModelMiner.jl/blob/7c3a34c91b245a7baa7d3bc70d00f419b430a89f/src/ModelMiner.jl#L20).

The issue seems to be coming from mod kwarg from MLJModels.jl/loading.jl at 81ba4a82ee6f9896ebd034931266bf38541fdd74 · JuliaAI/MLJModels.jl · GitHub which evaluate the expression in Main module. I tried to call load() with mod=@__MODULE__, but still the same error pops up.

The issue can be reproduced if you run the test https://github.com/v-i-s-h/ModelMiner.jl/blob/7c3a34c91b245a7baa7d3bc70d00f419b430a89f/test/runtests.jl

How can I fix this?

Looks like a project that would get a lot of interest.

First, as you probably realize, the load() method you are attempting to use is flagged as “private and experimental”, so perhaps it’s no surprise it does not work.

I’ve never got conditional loading of code from a function to work well in Julia. I can load code from a function using some variation of eval, but it seems you run into World Age issues if you try to use that code before the function returns. This is my experience with the @load macro, in any event.

Whatever the approach you will need to add the packages providing the MLJ models as dependencies in your Project.toml, unless you set up some kind of conditional loading using Requires.jl, which I wouldn’t do in this case.

My suggestion would be to start with an approach in which:

  • All possible model-providing packages are imported unconditionally
  • You build a mapping between entries in MLJModels’s metadata to the pre-imported model types, for use in programmatic manipulation of models

Something like this:

module Mining

const API_PACKAGES = [
    :MLJLinearModels,
    :NearestNeighborModels,
    :MLJMultivariateStatsInterface,
    :MLJDecisionTreeInterface,
]

for pkg in API_PACKAGES
    eval(:(import $pkg))
end
import MLJBase
import MLJModels


# # HELPERS FOR MANIPULATING MODEL METADATA

id(meta) = (meta.name, meta.package_name)

function api_pkg(meta)
    path = MLJModels.load_path(meta.name; pkg=meta.package_name)
    return split(path, ".") |> first
end


# # GET MAPPING FROM MODEL METADATA TO MODEL TYPE

const METADATA = MLJModels.models() do meta
    api_pkg(meta) in string.(API_PACKAGES)
end

const MODEL_TYPE_GIVEN_ID = Dict()

for meta in METADATA
    path = MLJModels.load_path(meta.name; pkg=meta.package_name)
    type_ex = Meta.parse(path)
    MODEL_TYPE_GIVEN_ID[id(meta)] = eval(type_ex)
end


# # FUNCTION TO PROGRAMATICALLY EVALUATE MODELS

function run(meta, data...; kwargs...)
    model  = MODEL_TYPE_GIVEN_ID[id(meta)]()
    e = MLJBase.evaluate(model, data...; kwargs...)
    return id(meta) => e.measurement[1]
end

function mine(data...; kwargs...)
    metadata = MLJModels.models(MLJModels.matching(data...)) do meta
        api_pkg(meta) in string.(API_PACKAGES)
    end
    return [run(meta, data...; kwargs...) for meta in metadata]
end

end

Then this should work:

using .Mining
using MLJBase

X, y = @load_iris

Mining.mine(X, y; measure=MLJBase.accuracy)
1 Like

Incidentally, in case you missed it, the MLJ manual has a section on model loading that might be helpful: Loading Model Code · MLJ

1 Like

There is also this related GitHub issue.

Also, @azev77, who has previously played around programmatic model comparisons in MLJ, may want to comment.

1 Like

Hi @ablaom ,

Thanks a lot for your insightful suggestions. This is exactly what I wanted! Also, you observation about WorldAge problems were spot-on. I was running into them with other approaches.

Thanks,
Vishnu Raj