"Julian" way to include external scripts/modules without name conflicts

I’m trying to manage a set of machine-learning models, each with its own custom-defined data preprocessing pipeline to highlight important features. In a directory, I have one folder per model and within each folder, I have a serialized custom-built model called “model.jls”, and I have an associated preprocessing script in “preprocessing.jl”. The preprocessing file would look like

function preprocessing!(df::DataFrame)
    df[!,"tag1"] = engineered_feature_1(df)
    df[!,"tag2"] = engineered_feature_2(df)
    return df
end

function engineered_feature_1(df::DataFrame)
    #<some stuff here>
end

function engineered_feature_2(df::DataFrame)
    #<some other stuff here>
end

What I want to do is have something like

mymodel = (
   model = deserialize(joinpath(modelfolder, "model.jls")),
   preprocessing = preprocessing_script(modelname, modelpath)
)

Now there would be a considerable number of “preprocessing.jl” files, which all define the “preprocessing function” and would potentiallly also have naming collisions with helper functions. What is the cleanest, most “Julian” way to get the contents from my “preprocessing.jl” file into this named tuple without potential naming conflicts?

For example, I could programatically create a module inside the “preprocessing_script” function

function preprocessing_script(modelname, modelpath)
    module_name = "Preprocessing_$(modelname)"
    module_command = "
    module $(module_name)
    include(joinpath($(modelpath), \"preprocessing.jl\")
    end
    "
    include_string(module_command)
    return eval(Symbol(module_name)).preprocessing!
end

Is this the way I should be doing it? It would be nice to have a way to include the file as a programatically-named module, and extract the main function of interest, while having readable stack traces that you get when you “include” the file.

Something like this might be a better pattern:

function get_preprocess_function(modelpath)
    # Create an anonymous module
    m = Module()

    # Include the model source file in it
    Base.include(m, joinpath(modelpath, "model.jl"))

    # Return the newly-defined `preprocess!` function
    return m.preprocess!
end
9 Likes

You can do that?!? This looks exactly like the kind of thing I was wanting to do!

This is a bottom of the Julia iceberg tip if I ever saw one. Never would have thought of that, who would ever do ?Base.include?

I think I saw that here:

https://docs.julialang.org/en/v1/manual/modules/#Default-top-level-definitions-and-bare-modules

2 Likes

Okay, so this worked when I used it at the top level, but if I use this function inside a function I get

MethodError: no method matching preprocess!
The applicable method may be too new: running in world age 33212, while current world is 33213

I guess I can fix this if I use
Base.@invokelatest preprocess!(data)
where
preprocess! = get_preprocess_function
At least this is an in-place modification so there is no “unknown-type” output that breaks type inference.