Question about MLJ

abcde · October 27, 2021, 7:56am

I want to make a multivariate probability prediction, and this is my current MLJ model search, but I don’t know if these models are valid. How can I find an efficient prediction model for my data?

task(model) = model.is_supervised && model.prediction_type == :probabilistic

models(task)

47-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia,
:is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype), T} where T<:Tuple}:
(name = AdaBoostClassifier, package_name = ScikitLearn, … )
(name = AdaBoostStumpClassifier, package_name = DecisionTree, … )
(name = BaggingClassifier, package_name = ScikitLearn, … )
(name = BayesianLDA, package_name = MultivariateStats, … )
(name = BayesianLDA, package_name = ScikitLearn, … )
(name = BayesianQDA, package_name = ScikitLearn, … )
⋮
(name = ProbabilisticSGDClassifier, package_name = ScikitLearn, … )
(name = RandomForestClassifier, package_name = BetaML, … )
(name = RandomForestClassifier, package_name = DecisionTree, … )
(name = RandomForestClassifier, package_name = ScikitLearn, … )
(name = SubspaceLDA, package_name = MultivariateStats, … )
(name = XGBoostClassifier, package_name = XGBoost, … )

thanks！

jbrea · October 27, 2021, 8:51am

What do you mean by “valid”? Are you looking for methods that can be run on your data (this could be done with e.g. matching(model, X, y))?

abcde · October 27, 2021, 9:11am

What I want to say is that I want to use the method of neural network to predict my multivariate data. How can I choose the best model among these 47 models for prediction?

jbrea · October 27, 2021, 11:30am

This is usually difficult to know in advance, because it depends on your data which model is the best one. I often pick a linear model as a first benchmark and if I happen to have enough data and guess that there is a non-linear relationship between the input and the output I would try a few non-linear methods, like neural networks, random forests, boosted trees or SVMs. To find the hyper-parameters and compare different kinds of models a common approach is based on cross-validation (see e.g. here for how to tune models).

Multilayer Neural Networks are supported by MLJFlux.

abcde · October 27, 2021, 12:35pm

Do you know where there are code examples for this? Thank you very much!

jbrea · October 27, 2021, 1:10pm

There are for example these tutorials. I also use MLJ for a course I am currently teaching (new material will be added every week until mid-december).

zuruck · October 27, 2021, 2:27pm

Thank you for these links.

Topic		Replies	Views
Questions about the use of MLJ General Usage	0	240	August 11, 2021
Anyone developing multinomial logistic regression? Optimization (Mathematical) proposal	19	4317	October 6, 2022
[ANN] Generalized Linear Regression package Package Announcements regression	4	2360	January 28, 2022
Automate training MLJ models Machine Learning machine-learning , mlj	14	2116	February 17, 2020
How to create a MLJModelInterface.Model interface of a complex model? Machine Learning mlj	1	365	February 25, 2021

Question about MLJ

Related topics