Hi there,
I am trying to build a MLJ interface for some ML algorithms in the BetaML package.
I am starting from the Decision Trees, but I have a few questions.
The function creating (and fitting) the tree is:
buildTree(x, y::Array{Ty,1}; maxDepth = size(x,1), minGain=0.0, minRecords=2, maxFeatures=size(x,2), forceClassification=false, splittingCriterion = (Ty <: Number && !forceClassification) ? variance : gini, mCols=nothing) where {Ty}
- As you can see some parameters depend by default by the data, like
maxFeatures
depends on the dimensionality of the explanation variables. I understood that model parameters should be part of the model struct, but how do I set defaults without seeing the data ? - Even more hard, the algorithm that I am trying to wrap automatically performs a regression or a classification task (and, in the later case, it returns a probability distribution) depending on the type of the label, with the option to override the task with
forceClassification
. As in ML there are different type of models, probabilistic and deterministic, which one do I choose ? Or should I wrap it as two separate MLJ models ? - Most of my models support
Missing
data in the input. I read thatMissing
is a scientific type per se. Should I declare an Union of supported types then, including theMissing
? - I have a case where my model doesn’t fit the fit/predict workflow, that is a model that (using GMM/EM) predicts the missing values in a matrix, based on the degree of similarities of the other elements of the columns to the other rows. How to I wrap it with MLJ ?
- Where can I find real-case examples ? For example, DecisionTrees.jl seems to be available through MLJ, but there is no code in the GitHub repo concerning MLJ…
Thank you!