No model matching using MLJ, how should I prepare my data?

lsablon · February 21, 2023, 10:57am

Hi,

I’m taking one of my first steps into MLJ by writing a simple script that can predict diverse fields for my research. I know what I want to do can be achieved using simpler interpolation methods or Flux.jl, but I want to do it using MLJ, just for me to learn and explore the ecosystem.

Here is my goal: I have a set of Array{Union{Missing, Float32}, 2}, each one associated with a triplet [p1, p2, p3] of parameters. I want to train the model on this set, and obtain a prediction function that takes any triplet of parameters as input and returns “the best matching” array.
My issue is that there is no model matching my data, according to models(matching(X)) where X is a Vector{Vector{Float64}} (I removed the missing value from the original arrays).
In summary, my dataset has the following schema:

┌────────┬────────────────────────────┬─────────────────┐
│ names  │ scitypes                   │ types           │
├────────┼────────────────────────────┼─────────────────┤
│ param  │ AbstractVector{Continuous} │ Vector{Float64} │
│ field  │ AbstractVector{Continuous} │ Vector{Float64} │
└────────┴────────────────────────────┴─────────────────┘

I can easily remove the vector structure of param and make 3 different columns, but I want to keep the field as a whole.

How else would you format such data so that MLJ proposes compatible models?

Thanks a lot,
L.

jbrea · February 21, 2023, 12:14pm

I’d suggest to do this step in a custom preprocessing function that converts Vector{Vector{Float64}} to a table (e.g. DataFrame) that can be given as input to any model. With this you could build a pipeline my_custom_preprocessor |> some_MLJ_model (see e.g. here).

lsablon · February 22, 2023, 10:52am

Unfortunately, this does not solve my issue, which is that the output of the following code is an empty vector, indicating that no model can be used. I was wondering if there is a workaround to enable me to use MLJ?

design_ = [vec(rand(3)) for _ in 1:27] #for reproducibility
field_ = [vec(rand(1359)) for _ in 1:27]

df = DF.DataFrame(param=design_, field=field_)

schema(df) |> display

df, df_test = partition(df, 26.0/27.0);

y, X = unpack(df, ==(:field));
y_test, X_test = unpack(df_test, ==(:field));

m = models(matching(X,y))

Leads to

┌───────┬────────────────────────────┬─────────────────┐
│ names │ scitypes                   │ types           │
├───────┼────────────────────────────┼─────────────────┤
│ param │ AbstractVector{Continuous} │ Vector{Float64} │
│ field │ AbstractVector{Continuous} │ Vector{Float64} │
└───────┴────────────────────────────┴─────────────────┘
NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}[]

jbrea · February 22, 2023, 12:33pm

You should match on the processed data, e.g.
m = models(matching(my_custom_preprocessor(X), my_custom_targetprocessor(y)))
with e.g. my_custom_preprocessor(x) = DataFrame(hcat(x...)', :auto) and
my_custom_targetprocessor = my_custom_preprocessor or
my_custom_targetprocessor(y) = getindex.(y, 1)

Topic		Replies	Views
Using a trained MLJ model for prediction on non-Table objects Machine Learning question	1	51	May 18, 2025
Fitting a multiple input Flux.jl model with learning networks in MLJ.jl Machine Learning mlj	4	319	November 1, 2023
DimensionMismatch with MLJ Machine Learning question , error , mlj	0	40	February 14, 2025
How to create a MLJModelInterface.Model interface of a complex model? Machine Learning mlj	1	365	February 25, 2021
MethodError: No Method Matching Learning Curve Machine Learning question , mlj	3	513	July 22, 2021

No model matching using MLJ, how should I prepare my data?

Related topics