[ANN] BetaML v0.8: Model defininition, hyperparameters tuning and fitting in 2 lines

sylvaticus · October 2, 2022, 11:58am

Dear all, I’m pleased to announce BetaML v0.8.

The Beta Machine Learning Toolkit is a package including many algorithms and utilities to implement machine learning workflows in Julia, with a detailed tutorial on its usage from Python or R (no wrapper packages are needed) and an extensive interface to MLJ.

Aside from the support of the standard mod = Model([Options]), fit!(mod,X,[Y]), predict(mod,[X]) paradigm for 22 models (see list below) , this version brings the implementation of one of the easiest hyperparameter tuning functionality available on ML libraries. From model definition, to tuning, fitting and prediction in just 3 lines of code:

mod = ModelXX(autotune=true)  # --> control autotune with the parameter `tunemethod`
fit!(mod,x,[y])               # --> autotune happens here together with final fitting
est = predict(mod,xnew)

Autotune is hyperthreaded with model-specific defaults. For example for Random Forests the defaults are:

tunemethod=SuccessiveHalvingSearch(
    hpranges     = Dict("n_trees"   => [10, 20, 30, 40],
                     "max_depth"    => [5,10,nothing],
                     "min_gain"     => [0.0, 0.1, 0.5],
                     "min_records"  => [2,3,5],
                     "max_features" => [nothing,5,10,30],
                     "beta"         => [0,0.01,0.1]),
    loss         = l2loss_by_cv, # works for both regression and classification
    res_shares   = [0.08, 0.1, 0.13, 0.15, 0.2, 0.3, 0.4]
    multithreads = false) # RF are already multi-threaded

For SuccessiveHalvingSearch, the number of models is reduced at each iteration in order to arrive at a single “best” model.
Only supervised model autotuning is currently implemented, but GMM-based clustering autotuning is planned using BIC or AIC.

Aside from hyperparameters autotuning, the other release notes are:

support for all models of the new “V2” API that implements a “standard” mod = Model([Options]), fit!(mod,X,[Y]), predict(mod,[X]) workflow (details here). Classic API is now deprecated, with some of its functions be removed in the next BetaML 0.9 versions and some unexported.
standardised function names to follow the Julia style guidelines and the new BetaML code style guidelines](Style guide · BetaML.jl Documentation)
new functions model_load and model_save to load/save trained models from the filesystem
new MinMaxScaler (StandardScaler was already available as classical API functions scale and getScalingFactors)
many bugfixes/improvements on corner situations
new MLJ interface models to NeuralNetworkEstimator

All models are coded in Julia and are part of the same package. Currently, BetaML includes 22 models implemented:

BetaML name	MLJ Interface	Category
PerceptronClassifier	LinearPerceptron	Supervised regressor
KernelPerceptronClassifier	KernelPerceptron	Supervised regressor
PegasosClassifier	Pegasos	Supervised classifier
DecisionTreeEstimator	DecisionTreeClassifier, DecisionTreeRegressor	Supervised regressor and classifier
RandomForestEstimator	RandomForestClassifier, RandomForestRegressor	Supervised regressor and classifier
NeuralNetworkEstimator	NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier	Supervised regressor and classifier
GMMRegressor1		Supervised regressor
GMMRegressor2	GaussianMixtureRegressor, MultitargetGaussianMixtureRegressor	Supervised regressor
KMeansClusterer	KMeans	Unsupervised hard clusterer
KMedoidsClusterer	KMedoids	Unsupervised hard clusterer
GMMClusterer	GaussianMixtureClusterer	Unsupervised soft clusterer
FeatureBasedImputer	SimpleImputer	Unsupervised missing data imputer
GMMImputer	GaussianMixtureImputer	Unsupervised missing data imputer
RFImputer	RandomForestImputer	Unsupervised missing data imputer
UniversalImputer	GeneralImputer	Unsupervised missing data imputer
MinMaxScaler		Data transformer
StandardScaler		Data transformer
Scaler		Data transformer
PCA		Data transformer
OneHotEncoder		Data transformer
OrdinalEncoder		Data transformer
ConfusionMatrix		Predictions assessment

Predictions are quite good, often better than the leading packages, although the resource usage is still considerable. You have detailed BetaML tutorials on classification, regression and clustering in the documentation.

It would be very nice if you could help me in making BetaML more efficient, at least for the models you care about, although the focus remains to provide a tool that is easy to use for everyone.
If useful, I am happy to transfer the package ownership to an appropriate organisation (this point was raised by @ logankilpatrick and I agree with him).

Palli · October 2, 2022, 7:49pm

Great! I didn’t know of this (I guess new):

Thanks to respectively PyJulia and JuliaCall, using BetaML in Python or R is almost as simple as using a native library. In both cases we need first to download and install the Julia binaries for our operating system from JuliaLang.org.

JuliaCall (of PythonCall.jl) is better since: Redirecting to https://juliapy.github.io/PythonCall.jl/stable/juliacall/

It will automatically download a suitable version of Julia if required.

~~Maybe~~ it works already [EDIT: Of course it should, I believe JuliaCall works for all Julia code; what I had in mind was the other direction, when you make a wrapper for Python, JuliaCll.jl is preferred], or your package can be fixed to support it. It would be nice if RCall (or other package) had/added such auto-download.

sylvaticus · October 2, 2022, 7:57pm

Gonna look in deepth on this topic as soon I’ll have a pc back on my hands
Thanks

sylvaticus · October 2, 2022, 8:16pm

argh, there is a name conflict here…
What I tested, and I believe that the text in the tutorial is correct is:

Julia <-> Python

Python package PyJulia (“julia” in pip)

Julia <-> R

R package JuliaCall

Maybe you are right, it is time to test also the python package JuliaCall…

sylvaticus · October 3, 2022, 1:56pm

I merged your pull request and added a section on how to use BetaML with the JuliaCall python package, even if it looks still a bit “unripe” to me…

Topic		Replies	Views
[ANN] BetaML v 0.7 New Missing values imputers and "standardised" fit!/predict API Package Announcements	0	256	August 2, 2022
[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package Package Announcements package , announcement , machine-learning	16	3340	May 15, 2024
MLJ - A machine learning toolbox for Julia Package Announcements	0	2213	April 30, 2019
New Julia machine learning package: NovaML Package Announcements package , machine-learning	17	1088	September 6, 2024
[ANN] MLJ: an update Machine Learning	7	1277	December 1, 2019

[ANN] BetaML v0.8: Model defininition, hyperparameters tuning and fitting in 2 lines

Related topics