Question about MLJ

I want to make a multivariate probability prediction, and this is my current MLJ model search, but I don’t know if these models are valid. How can I find an efficient prediction model for my data?

task(model) = model.is_supervised && model.prediction_type == :probabilistic


47-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia,
:is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype), T} where T<:Tuple}:
(name = AdaBoostClassifier, package_name = ScikitLearn, … )
(name = AdaBoostStumpClassifier, package_name = DecisionTree, … )
(name = BaggingClassifier, package_name = ScikitLearn, … )
(name = BayesianLDA, package_name = MultivariateStats, … )
(name = BayesianLDA, package_name = ScikitLearn, … )
(name = BayesianQDA, package_name = ScikitLearn, … )

(name = ProbabilisticSGDClassifier, package_name = ScikitLearn, … )
(name = RandomForestClassifier, package_name = BetaML, … )
(name = RandomForestClassifier, package_name = DecisionTree, … )
(name = RandomForestClassifier, package_name = ScikitLearn, … )
(name = SubspaceLDA, package_name = MultivariateStats, … )
(name = XGBoostClassifier, package_name = XGBoost, … )


What do you mean by “valid”? Are you looking for methods that can be run on your data (this could be done with e.g. matching(model, X, y))?

What I want to say is that I want to use the method of neural network to predict my multivariate data. How can I choose the best model among these 47 models for prediction?

This is usually difficult to know in advance, because it depends on your data which model is the best one. I often pick a linear model as a first benchmark and if I happen to have enough data and guess that there is a non-linear relationship between the input and the output I would try a few non-linear methods, like neural networks, random forests, boosted trees or SVMs. To find the hyper-parameters and compare different kinds of models a common approach is based on cross-validation (see e.g. here for how to tune models).

Multilayer Neural Networks are supported by MLJFlux.

Do you know where there are code examples for this? Thank you very much!

There are for example these tutorials. I also use MLJ for a course I am currently teaching (new material will be added every week until mid-december).

Thank you for these links.