[ANN] BetaML v0.8: Model defininition, hyperparameters tuning and fitting in 2 lines

Dear all, I’m pleased to announce BetaML v0.8.

The Beta Machine Learning Toolkit is a package including many algorithms and utilities to implement machine learning workflows in Julia, with a detailed tutorial on its usage from Python or R (no wrapper packages are needed) and an extensive interface to MLJ.

Aside from the support of the standard mod = Model([Options]), fit!(mod,X,[Y]), predict(mod,[X]) paradigm for 22 models (see list below) , this version brings the implementation of one of the easiest hyperparameter tuning functionality available on ML libraries. From model definition, to tuning, fitting and prediction in just 3 lines of code:

mod = ModelXX(autotune=true)  # --> control autotune with the parameter `tunemethod`
fit!(mod,x,[y])               # --> autotune happens here together with final fitting
est = predict(mod,xnew)

Autotune is hyperthreaded with model-specific defaults. For example for Random Forests the defaults are:

    hpranges     = Dict("n_trees"   => [10, 20, 30, 40],
                     "max_depth"    => [5,10,nothing],
                     "min_gain"     => [0.0, 0.1, 0.5],
                     "min_records"  => [2,3,5],
                     "max_features" => [nothing,5,10,30],
                     "beta"         => [0,0.01,0.1]),
    loss         = l2loss_by_cv, # works for both regression and classification
    res_shares   = [0.08, 0.1, 0.13, 0.15, 0.2, 0.3, 0.4]
    multithreads = false) # RF are already multi-threaded

For SuccessiveHalvingSearch, the number of models is reduced at each iteration in order to arrive at a single “best” model.
Only supervised model autotuning is currently implemented, but GMM-based clustering autotuning is planned using BIC or AIC.

Aside from hyperparameters autotuning, the other release notes are:

  • support for all models of the new “V2” API that implements a “standard” mod = Model([Options]), fit!(mod,X,[Y]), predict(mod,[X]) workflow (details here). Classic API is now deprecated, with some of its functions be removed in the next BetaML 0.9 versions and some unexported.
  • standardised function names to follow the Julia style guidelines and the new BetaML code style guidelines](Style guide · BetaML.jl Documentation)
  • new functions model_load and model_save to load/save trained models from the filesystem
  • new MinMaxScaler (StandardScaler was already available as classical API functions scale and getScalingFactors)
  • many bugfixes/improvements on corner situations
  • new MLJ interface models to NeuralNetworkEstimator

All models are coded in Julia and are part of the same package. Currently, BetaML includes 22 models implemented:

BetaML name MLJ Interface Category
PerceptronClassifier LinearPerceptron Supervised regressor
KernelPerceptronClassifier KernelPerceptron Supervised regressor
PegasosClassifier Pegasos Supervised classifier
DecisionTreeEstimator DecisionTreeClassifier, DecisionTreeRegressor Supervised regressor and classifier
RandomForestEstimator RandomForestClassifier, RandomForestRegressor Supervised regressor and classifier
NeuralNetworkEstimator NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier Supervised regressor and classifier
GMMRegressor1 Supervised regressor
GMMRegressor2 GaussianMixtureRegressor, MultitargetGaussianMixtureRegressor Supervised regressor
KMeansClusterer KMeans Unsupervised hard clusterer
KMedoidsClusterer KMedoids Unsupervised hard clusterer
GMMClusterer GaussianMixtureClusterer Unsupervised soft clusterer
FeatureBasedImputer SimpleImputer Unsupervised missing data imputer
GMMImputer GaussianMixtureImputer Unsupervised missing data imputer
RFImputer RandomForestImputer Unsupervised missing data imputer
UniversalImputer GeneralImputer Unsupervised missing data imputer
MinMaxScaler Data transformer
StandardScaler Data transformer
Scaler Data transformer
PCA Data transformer
OneHotEncoder Data transformer
OrdinalEncoder Data transformer
ConfusionMatrix Predictions assessment

Predictions are quite good, often better than the leading packages, although the resource usage is still considerable. You have detailed BetaML tutorials on classification, regression and clustering in the documentation.

It would be very nice if you could help me in making BetaML more efficient, at least for the models you care about, although the focus remains to provide a tool that is easy to use for everyone.
If useful, I am happy to transfer the package ownership to an appropriate organisation (this point was raised by @ logankilpatrick and I agree with him).


Great! I didn’t know of this (I guess new):

Thanks to respectively PyJulia and JuliaCall, using BetaML in Python or R is almost as simple as using a native library. In both cases we need first to download and install the Julia binaries for our operating system from JuliaLang.org.

JuliaCall (of PythonCall.jl) is better since: Guide · PythonCall & JuliaCall

It will automatically download a suitable version of Julia if required.

Maybe it works already [EDIT: Of course it should, I believe JuliaCall works for all Julia code; what I had in mind was the other direction, when you make a wrapper for Python, JuliaCll.jl is preferred], or your package can be fixed to support it. It would be nice if RCall (or other package) had/added such auto-download.

Gonna look in deepth on this topic as soon I’ll have a pc back on my hands :slight_smile:

argh, there is a name conflict here…
What I tested, and I believe that the text in the tutorial is correct is:

Julia <-> Python

Python package PyJulia (“julia” in pip)

Julia <-> R

R package JuliaCall

Maybe you are right, it is time to test also the python package JuliaCall

I merged your pull request and added a section on how to use BetaML with the JuliaCall python package, even if it looks still a bit “unripe” to me…