I’m trying to manage a set of machine-learning models, each with its own custom-defined data preprocessing pipeline to highlight important features. In a directory, I have one folder per model and within each folder, I have a serialized custom-built model called “model.jls”, and I have an associated preprocessing script in “preprocessing.jl”. The preprocessing file would look like
function preprocessing!(df::DataFrame)
df[!,"tag1"] = engineered_feature_1(df)
df[!,"tag2"] = engineered_feature_2(df)
return df
end
function engineered_feature_1(df::DataFrame)
#<some stuff here>
end
function engineered_feature_2(df::DataFrame)
#<some other stuff here>
end
What I want to do is have something like
mymodel = (
model = deserialize(joinpath(modelfolder, "model.jls")),
preprocessing = preprocessing_script(modelname, modelpath)
)
Now there would be a considerable number of “preprocessing.jl” files, which all define the “preprocessing function” and would potentiallly also have naming collisions with helper functions. What is the cleanest, most “Julian” way to get the contents from my “preprocessing.jl” file into this named tuple without potential naming conflicts?
For example, I could programatically create a module inside the “preprocessing_script” function
function preprocessing_script(modelname, modelpath)
module_name = "Preprocessing_$(modelname)"
module_command = "
module $(module_name)
include(joinpath($(modelpath), \"preprocessing.jl\")
end
"
include_string(module_command)
return eval(Symbol(module_name)).preprocessing!
end
Is this the way I should be doing it? It would be nice to have a way to include the file as a programatically-named module, and extract the main function of interest, while having readable stack traces that you get when you “include” the file.
Something like this might be a better pattern:
function get_preprocess_function(modelpath)
# Create an anonymous module
m = Module()
# Include the model source file in it
Base.include(m, joinpath(modelpath, "model.jl"))
# Return the newly-defined `preprocess!` function
return m.preprocess!
end
9 Likes
You can do that?!? This looks exactly like the kind of thing I was wanting to do!
This is a bottom of the Julia iceberg tip if I ever saw one. Never would have thought of that, who would ever do ?Base.include
?
Okay, so this worked when I used it at the top level, but if I use this function inside a function I get
MethodError: no method matching preprocess!
The applicable method may be too new: running in world age 33212, while current world is 33213
I guess I can fix this if I use
Base.@invokelatest preprocess!(data)
where
preprocess! = get_preprocess_function
At least this is an in-place modification so there is no “unknown-type” output that breaks type inference.