Defining MLJ pipelines within a function

Question asked on slack:

Hello, I’m new to Julia and MLJ. Is there a recommended way to compose models within functions? I tried placing the code from the “ Lightning Tour ” within a function of my own package, however when compiling that package I’m getting iterated_booster not defined when I can see it is defined (or at least I think it is…).

Here’s the relevant code posted by the commenter:

module Slotter
 
using MLJ
using MLJIteration
using EvoTrees
 
function main()
    Booster = @load EvoTreeRegressor # loads code defining a model type
    booster = Booster(max_depth = 2)   # specify hyper-parameter at construction
    booster.nrounds = 50               # or mutate post facto
    
    iterated_booster = IteratedModel(
        model = booster,
        resampling = Holdout(fraction_train = 0.8),
        controls = [Step(2), NumberSinceBest(3), NumberLimit(300)],
        measure = l1,
        retrain = true,
    )
 
    pipe = @pipeline ContinuousEncoder iterated_booster
 
    max_depth_range =
        range(pipe, :(deterministic_iterated_model.model.max_depth), lower = 1, upper = 10)
 
    self_tuning_pipe = TunedModel(
        model = pipe,
        tuning = RandomSearch(),
        ranges = max_depth_range,
        resampling = CV(nfolds = 3, rng = 456),
        measure = l1,
        acceleration = CPUThreads(),
        n = 50,
    )
 
    X, y = @load_reduced_ames
    mach = machine(self_tuning_pipe, X, y)
    evaluate!(
        mach,
        measures = [l1, l2],
        resampling = CV(nfolds = 5, rng = 123),
        acceleration = CPUThreads(),
        verbosity = 2,
    )
end
2 Likes

The problem is that macros evaluate their arguments in the global scope. In your code @pipeline ContinuousEncoder iterated_model throws an error, because iterated_model is defined in the function and not in global scope. There is an open issue (which I couldn’t find just now) to re-implement pipelines without macros. In the meantime, there are various work-arounds. The most robust would be to use the more general learning network syntax described in this manual section, “exporting” your learning network using Method II (no macro). There is admittedly a wee bit of a learning curve here. Simpler workarounds might exist but will depend on what exactly it is you want to do.

3 Likes

Here’s the relevant github issue, for anyone interested:

https://github.com/alan-turing-institute/MLJ.jl/issues/594

2 Likes

Awesome, thanks to everyone :slight_smile: