[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package

BetaML_logo microExample_white

Dear all,

I would like to announce the availability of “BetaML” , the Beta Machine Learning toolkit, a package for Machine Learning algorithms and related utilities.

The toolkit is currently made of 4 modules. Perceptron includes the classical perceptron linear classifier, but also the non-linear kernel perceptron and the gradient-based Pegasus classifier. Nn implements easy-to-model Artificial Neural Networks (simple feed-forward only for the moment, but we plan to add support for convolutional layers, Recurrent Neural Network and LSTM ones). Note that automatic differentiation with Zygote is optional, you can pass your own derivative of the activation function if you wish (common ones are provided). Clustering has algorithms such as kmeans, Kmedoids and Expectation-Maximisation based on Gaussian Mixture Models (GMM). As the EM algorithm supports partially missing observations (observations with missing data only on some dimensions), it is used as backbone algorithm for collaborative filtering (recommendation systems). Finally Utils is a module implementing common functions as scaling, one-hot encoding, various kernels and distance metrics.

BetaML most likely has value only didactically, as the approaches are the “vanilla” ones, i.e. the simplest possible ones, and GPU is not supported. For “serious” machine learning work in Julia I would suggest to use either Flux or Knet.

As the focus is mainly didactic, functions have pretty longer but more explicit names than usual… for example the Dense layer is a " DenseLayer " , the RBF kernel is " radialKernel " , etc.

That said, Julia is a relatively fast language and most hard job is done in multithreaded functions or using matrix operations whose underlying libraries may be multithreaded, so it is reasonably fast for small exploratory tasks. Also it is already very flexible. For example, one can implement its own layer as a subtype of the abstract type Layer or its own optimisation algorithm as a subtype of OptimisationAlgorithm or even specify its own distance metric in the Kmedoids algorithm…

This repository started from implementing in the Julia language the concepts taught in the MITX 6.86x - Machine Learning with Python: from Linear Models to Deep Learning course, and theoretical notes describing most of these algorithms are available at the companion repository https://github.com/sylvaticus/MITx_6.86x.


Antonello Lobianco, Bureau d’Economie Théorique et Appliquée of Nancy & AgroParisTech


(yep, the logo is inspired by a popular superhero…. the wish is that whenever we have a numerical problem, the Beta Machine Learning toolkit could come to the rescue with its superpowers! :slight_smile: :slight_smile: :slight_smile: )

This is a full example of multi-class classification of the Sepal dataset:

# Load Modules
using BetaML.Nn, DelimitedFiles, Random, StatsPlots # Load the main module and ausiliary modules
Random.seed!(123); # Fix the random seed (to obtain reproducible results)

# Load the data
iris     = readdlm(joinpath(dirname(Base.find_package("BetaML")),"..","test","data","iris.csv"),',',skipstart=1)
iris     = iris[shuffle(axes(iris, 1)), :] # Shuffle the records, as they aren't by default
x        = convert(Array{Float64,2}, iris[:,1:4])
y        = map(x->Dict("setosa" => 1, "versicolor" => 2, "virginica" =>3)[x],iris[:, 5]) # Convert the target column to numbers
y_oh     = oneHotEncoder(y) # Convert to One-hot representation (e.g. 2 => [0 1 0], 3 => [0 0 1])

# Split the data in training/testing sets
ntrain    = Int64(round(size(x,1)*0.8))
xtrain    = x[1:ntrain,:]
ytrain    = y[1:ntrain]
ytrain_oh = y_oh[1:ntrain,:]
xtest     = x[ntrain+1:end,:]
ytest     = y[ntrain+1:end]

# Define the Artificial Neural Network model
l1   = DenseLayer(4,10,f=relu) # Activation function is ReLU
l2   = DenseLayer(10,3)        # Activation function is identity by default
l3   = VectorFunctionLayer(3,3,f=softMax) # Add a (parameterless) layer whose activation function (softMax in this case) is defined to all its nodes at once
mynn = buildNetwork([l1,l2,l3],squaredCost,name="Multinomial logistic regression Model Sepal") # Build the NN and use the squared cost (aka MSE) as error function

# Training it (default to SGD)
res = train!(mynn,scale(xtrain),ytrain_oh,epochs=100,batchSize=6) # Use optAlg=SGD (Stochastic Gradient Descent) by default

# Test it
ŷtrain        = predict(mynn,scale(xtrain))   # Note the scaling function
ŷtest         = predict(mynn,scale(xtest))
trainAccuracy = accuracy(ŷtrain,ytrain,tol=1) # 0.983
testAccuracy  = accuracy(ŷtest,ytest,tol=1)   # 1.0

# Visualise results
testSize = size(ŷtest,1)
ŷtestChosen =  [argmax(ŷtest[i,:]) for i in 1:testSize]
groupedbar([ytest ŷtestChosen], label=["ytest" "ŷtest (est)"], title="True vs estimated categories") # All records correctly labelled !
plot(0:res.epochs,res.ϵ_epochs, ylabel="epochs",xlabel="error",legend=nothing,title="Avg. error per epoch on the Sepal dataset")

image image

PS: thanks to @kevbonham on topic 37198:

It ended up that writing tests, doc, getting CI and registration has been almost as time consuming that writing the library itself, but a very rewarding experience !


That sentence of motivation is now my largest contribution to Julia machine learning :joy:.

Great work!


I see you use ReLU (and you have some other usual suspects). But it’s outdated, and the closest also fast seems to be PLU (feel free to copy my implementation there, and for others):

Mish seems to me the best activation function (and CELU and more I link to there also interesting).

Thank you, I added the celu function to master… I don’t feel the need to add too many activation functions as the user has the ability to choose whatever function she/he wants by just providing the f parameter in the layer constructor…

v0.2 is out.

What’s new:

Clustering: generic mixture support

Added generic, user-specified Mixture support to the EM algorithm, with {Spherical,Diagonal,Full} Gaussian mixtures already implemented.

The support for missing data allows the EM algorithm to be used for missing imputation or collaborative filtering/reccomendation system (using the function predictMissing).

Neural Networks: More default activation functions

Although the user can provide its own activation function (and optionally its derivative to avoid using AD), we included the most recent activation functions (and their derivatives), namely relu, elu, celu, plu, sigmoid, softmax, softplus, mish (thanks to user @Palli).

Utils: Various addition/improvements

We added reverse scaling (in order to scale back the labels/output values), BIC and AIC criteria, meanRelError and the parameter ignoreLabels to the accuracy function in order to account for classification tasks where the label itself doesn’t matter, just its distribution (e.g. in unsupervised learning/clustering).
In master you’ll find also PCA.

The documentation for v0.2 is here.


V0.2.2 is out

What’s new (compared to v0.2.0):

PCA Analysys

You can now transform your data using PCA specifying either the number of dimensions you want to keep or the maximum error (variance) you are wiling to accept

kmeans init strategy for em clustering

The expectation-maximisation algorithm for fitting a Generative Mixture Models and cluster data/impute missing data can now be automatically initialised with the output of a kmeans clustering (just pass the parameter initStrategy="kmeans".

ADAM optimisation algorithm for neural networks

In addition to the classical Stochastic Gradient Descent, we added the efficient ADAM, moment based optimiser. The implementation is the same as in the paper where it is introduced, with the difference that the learning rate can be expressed as a (user-provied) function of the epoch rather than being a constant (but we kept as default t -> 0.001 as in the paper).
The solution we chosen proved to be very flexible: adding a optimiser is just a matter of creating a struct that subclass OptimisationAlgorithm and implementing singleUpdate!(θ,▽,optAlg::OptimisationAlgorithm;nEpoch,nBatch,nBatches,xbatch,ybatch) and eventually initOptAlg!(optAlg::OptimisationAlgorithm;θ,batchSize,x,y).


v0.3 is out

What’s new in v0.3 (compared to 0.2.2):

  • Decision Trees / Random Forest (BetaML.Trees)

    • Added Decision Trees and Random Forests algorithms for classification and regression tasks, with support of missing values and the following “impurity” measures: gini, entropy, variance
  • Neural Networks (BetaML.NN)

    • Implemented Xavier initialisation as default weigth initialisation
  • Utilitis (BetaML.Utils)

    • Added explVarByDim in the pca output so one can choose the number of dimensions to use running pca only once
    • Added crossEntropy and dCrossEntropy (to use as NN loss functions)
    • Added the “impurity measures” functions giniImpurity and entropy
    • Added the “utility” functions classCounts and meanDicts
  • Documentation

    • Added regression example on bike sharing demand using both Neural Networks and Decision Trees / Random Forest

Bdw, if someone would like to review the corresponding Open Source Software paper… The review is stock in the “pre-review” status, as the editors can’t find a reviewer…

I have added BetaML.jl to the above list.

1 Like

BetaML v0.4.0 is out

What’s new in v0.4 (compared to 0.3):

  • Decision Trees / Random Forests (BetaML.Trees)

    • Added support for fully categorical features (i.e. non even sortable ones) to trees models. All Trees models accept now almost any kind of possible type as feature: continuous, categorical, ordinal, missing data…
    • Added oobEstimation to Random Trees and support for trees weights on Random Forests models
  • Perceptron-like models (BetaML.Perceptron)

    • perceptron, kernelPerceptron and pegasos can now perfom multiclass classification and report their otputs as “probabilities” (or better, “normalised scores”) for each class. Use their [name]Binary version for binary classification on {-1,+1} labels, and/or mode(y) to retrieve a single class prediction per each record.
  • Utilitis (BetaML.Utils)

    • Added issortable(array) to check if an array is sortable, i.e. has methos issort defined"""
    • Added partition() to partition (by rows) one or more matrices according to the predetermined shares, e.g. ((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.7,0.3])
    • Added colsWithMissings to check which columns in a matrix have indeed missing values
    • Expanded error() and accuracy() to work with any T categorical value, not just Int64
  • Clustering (Beta.Clustering)

    • Renamed the em algorithm to gmm

    • Experimental initial integration with the MLJ API. For the time being the following models have been made available to the MLJ framework : PerceptronClassifier, KernelPerceptronClassifier, PegasosClassifier, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor.
  • Other

    • Moved Continuous Integration to GitHub actions
    • Rename all rShuffle and sequential parameters in the various algorithms to shuffle
    • New package dependencies: CategoricalArrays and MLJModelInterface
    • Several bugfixes, optimisations and updated dependencies (see the commit log for details)
    • Updated documentation
    • Added option to run partial testing, eg: `using Pkg; Pkg.test(“BetaML”, test_args=[“Trees”,“Clustering”,“all”])

BetaML v0.5.0 is out

What’s new in v0.5 (compared to 0.4.1):

  • Documentation

    • Extensive step-by-step tutorial to BetaML algorithms (and in a certain sense to ML and Julia in general), with comparisons with Clustering.jl, GaussianMixtures.jl, Flux.jl and DecisionTree.jl packages;
    • Added option to “preview” the documentation without running the code in the tutorial (push!(ARGS,"preview"); include("make.jl"))

    • Integration with the MLJ API. The following models have been made available to the MLJ framework : PerceptronClassifier, KernelPerceptronClassifier, PegasosClassifier, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor, KMeans, KMedoids, GMMClusterer, MissingImputator
  • Package reorganisation

    • All the functionality of the different sub-modules is now re-exported at the root level, so the user needs just to using BetaML to access it
    • The Utils module has been split in different files
  • Stochasticity management

    • Added the parameter rng to all stochastic models to allow fine-tuning of the stochasticity/replicability trade-off
    • Added function generateParallelRngs to allow repeteable results indipendently from the number of thread used
    • Extended Random.shuffle function to allow multiple matrices and specify the dimension over which to shuffle
  • Utilities (BetaML.Utils)

    • Added dims and copy parameters to partition
    • Added crossValidation, with a user defined function/do block and configurable sampler (SamplerWithData{T <: AbstractDataSampler})
    • Added ConfusionMatrix
    • Added the pool1d activation function
  • Other

    • Improved the grid initialisation for clusters
    • Updated the JOSS paper
    • New package dependencies: StableRNGs and ForceImport
    • Several bugfixes, optimisations and updated dependencies (see the commit log for details)