How to create a MLJModelInterface.Model interface of a complex model?

sylvaticus · February 23, 2021, 4:24pm

Hi there,
I am trying to build a MLJ interface for some ML algorithms in the BetaML package.

I am starting from the Decision Trees, but I have a few questions.

The function creating (and fitting) the tree is:

buildTree(x, y::Array{Ty,1}; maxDepth = size(x,1), minGain=0.0, minRecords=2, maxFeatures=size(x,2), forceClassification=false, splittingCriterion = (Ty <: Number && !forceClassification) ? variance : gini, mCols=nothing) where {Ty}

As you can see some parameters depend by default by the data, like maxFeatures depends on the dimensionality of the explanation variables. I understood that model parameters should be part of the model struct, but how do I set defaults without seeing the data ?
Even more hard, the algorithm that I am trying to wrap automatically performs a regression or a classification task (and, in the later case, it returns a probability distribution) depending on the type of the label, with the option to override the task with forceClassification. As in ML there are different type of models, probabilistic and deterministic, which one do I choose ? Or should I wrap it as two separate MLJ models ?
Most of my models support Missing data in the input. I read that Missing is a scientific type per se. Should I declare an Union of supported types then, including the Missing ?
I have a case where my model doesn’t fit the fit/predict workflow, that is a model that (using GMM/EM) predicts the missing values in a matrix, based on the degree of similarities of the other elements of the columns to the other rows. How to I wrap it with MLJ ?
Where can I find real-case examples ? For example, DecisionTrees.jl seems to be available through MLJ, but there is no code in the GitHub repo concerning MLJ…

Thank you!

sylvaticus · February 25, 2021, 10:26am

While I did somehow managed to write the MLJ interface for a deterministic model, I am trying to write the interface for a probabilistic model whose predict(model,X) method returns a vector of dictionary of label => prob.

I normally use arrays of T for the Y, but I saw that it works also with Y being a CategoricalArray.

However I am stuck here now, and don’t know hot to return the prediction in the format wanted by MLJ.

[EDIT]: I move this post under this Thread to consolidate it…

Topic		Replies	Views
PPL connection to MLJ.jl Probabilistic Programming mlj	12	1648	September 22, 2019
Failed prediction from BetaML DecisionTreeClassifier Machine Learning question	10	581	September 22, 2022
Question Regarding leveraging MLJ.jl's CV features for my own Machine Learning	5	494	November 25, 2019
Questions about the use of MLJ General Usage	0	240	August 11, 2021
Using a trained MLJ model for prediction on non-Table objects Machine Learning question	1	51	May 18, 2025

How to create a MLJModelInterface.Model interface of a complex model?

Related topics