Hi there,
I am trying to build a MLJ interface for some ML algorithms in the BetaML package.
I am starting from the Decision Trees, but I have a few questions.
The function creating (and fitting) the tree is:
buildTree(x, y::Array{Ty,1}; maxDepth = size(x,1), minGain=0.0, minRecords=2, maxFeatures=size(x,2), forceClassification=false, splittingCriterion = (Ty <: Number && !forceClassification) ? variance : gini, mCols=nothing) where {Ty}
- As you can see some parameters depend by default by the data, like
maxFeaturesdepends on the dimensionality of the explanation variables. I understood that model parameters should be part of the model struct, but how do I set defaults without seeing the data ? - Even more hard, the algorithm that I am trying to wrap automatically performs a regression or a classification task (and, in the later case, it returns a probability distribution) depending on the type of the label, with the option to override the task with
forceClassification. As in ML there are different type of models, probabilistic and deterministic, which one do I choose ? Or should I wrap it as two separate MLJ models ? - Most of my models support
Missingdata in the input. I read thatMissingis a scientific type per se. Should I declare an Union of supported types then, including theMissing? - I have a case where my model doesn’t fit the fit/predict workflow, that is a model that (using GMM/EM) predicts the missing values in a matrix, based on the degree of similarities of the other elements of the columns to the other rows. How to I wrap it with MLJ ?
- Where can I find real-case examples ? For example, DecisionTrees.jl seems to be available through MLJ, but there is no code in the GitHub repo concerning MLJ…
Thank you!