Dear all, I’m pleased to announce BetaML v0.8.
The Beta Machine Learning Toolkit is a package including many algorithms and utilities to implement machine learning workflows in Julia, with a detailed tutorial on its usage from Python or R (no wrapper packages are needed) and an extensive interface to MLJ.
Aside from the support of the standard mod = Model([Options])
, fit!(mod,X,[Y])
, predict(mod,[X])
paradigm for 22 models (see list below) , this version brings the implementation of one of the easiest hyperparameter tuning functionality available on ML libraries. From model definition, to tuning, fitting and prediction in just 3 lines of code:
mod = ModelXX(autotune=true) # --> control autotune with the parameter `tunemethod`
fit!(mod,x,[y]) # --> autotune happens here together with final fitting
est = predict(mod,xnew)
Autotune is hyperthreaded with model-specific defaults. For example for Random Forests the defaults are:
tunemethod=SuccessiveHalvingSearch(
hpranges = Dict("n_trees" => [10, 20, 30, 40],
"max_depth" => [5,10,nothing],
"min_gain" => [0.0, 0.1, 0.5],
"min_records" => [2,3,5],
"max_features" => [nothing,5,10,30],
"beta" => [0,0.01,0.1]),
loss = l2loss_by_cv, # works for both regression and classification
res_shares = [0.08, 0.1, 0.13, 0.15, 0.2, 0.3, 0.4]
multithreads = false) # RF are already multi-threaded
For SuccessiveHalvingSearch
, the number of models is reduced at each iteration in order to arrive at a single “best” model.
Only supervised model autotuning is currently implemented, but GMM-based clustering autotuning is planned using BIC
or AIC
.
Aside from hyperparameters autotuning, the other release notes are:
- support for all models of the new “V2” API that implements a “standard”
mod = Model([Options])
,fit!(mod,X,[Y])
,predict(mod,[X])
workflow (details here). Classic API is now deprecated, with some of its functions be removed in the next BetaML 0.9 versions and some unexported. - standardised function names to follow the Julia style guidelines and the new BetaML code style guidelines](Style guide · BetaML.jl Documentation)
- new functions
model_load
andmodel_save
to load/save trained models from the filesystem - new
MinMaxScaler
(StandardScaler
was already available as classical API functionsscale
andgetScalingFactors
) - many bugfixes/improvements on corner situations
- new MLJ interface models to
NeuralNetworkEstimator
All models are coded in Julia and are part of the same package. Currently, BetaML
includes 22 models implemented:
BetaML name | MLJ Interface | Category |
---|---|---|
PerceptronClassifier | LinearPerceptron | Supervised regressor |
KernelPerceptronClassifier | KernelPerceptron | Supervised regressor |
PegasosClassifier | Pegasos | Supervised classifier |
DecisionTreeEstimator | DecisionTreeClassifier, DecisionTreeRegressor | Supervised regressor and classifier |
RandomForestEstimator | RandomForestClassifier, RandomForestRegressor | Supervised regressor and classifier |
NeuralNetworkEstimator | NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier | Supervised regressor and classifier |
GMMRegressor1 | Supervised regressor | |
GMMRegressor2 | GaussianMixtureRegressor, MultitargetGaussianMixtureRegressor | Supervised regressor |
KMeansClusterer | KMeans | Unsupervised hard clusterer |
KMedoidsClusterer | KMedoids | Unsupervised hard clusterer |
GMMClusterer | GaussianMixtureClusterer | Unsupervised soft clusterer |
FeatureBasedImputer | SimpleImputer | Unsupervised missing data imputer |
GMMImputer | GaussianMixtureImputer | Unsupervised missing data imputer |
RFImputer | RandomForestImputer | Unsupervised missing data imputer |
UniversalImputer | GeneralImputer | Unsupervised missing data imputer |
MinMaxScaler | Data transformer | |
StandardScaler | Data transformer | |
Scaler | Data transformer | |
PCA | Data transformer | |
OneHotEncoder | Data transformer | |
OrdinalEncoder | Data transformer | |
ConfusionMatrix | Predictions assessment |
Predictions are quite good, often better than the leading packages, although the resource usage is still considerable. You have detailed BetaML tutorials on classification, regression and clustering in the documentation.
It would be very nice if you could help me in making BetaML more efficient, at least for the models you care about, although the focus remains to provide a tool that is easy to use for everyone.
If useful, I am happy to transfer the package ownership to an appropriate organisation (this point was raised by @ logankilpatrick and I agree with him).