[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package

BetaML_logo microExample_white

Dear all,

I would like to announce the availability of “BetaML” , the Beta Machine Learning toolkit, a package for Machine Learning algorithms and related utilities.

The toolkit is currently made of 4 modules. Perceptron includes the classical perceptron linear classifier, but also the non-linear kernel perceptron and the gradient-based Pegasus classifier. Nn implements easy-to-model Artificial Neural Networks (simple feed-forward only for the moment, but we plan to add support for convolutional layers, Recurrent Neural Network and LSTM ones). Note that automatic differentiation with Zygote is optional, you can pass your own derivative of the activation function if you wish (common ones are provided). Clustering has algorithms such as kmeans, Kmedoids and Expectation-Maximisation based on Gaussian Mixture Models (GMM). As the EM algorithm supports partially missing observations (observations with missing data only on some dimensions), it is used as backbone algorithm for collaborative filtering (recommendation systems). Finally Utils is a module implementing common functions as scaling, one-hot encoding, various kernels and distance metrics.

BetaML most likely has value only didactically, as the approaches are the “vanilla” ones, i.e. the simplest possible ones, and GPU is not supported. For “serious” machine learning work in Julia I would suggest to use either Flux or Knet.

As the focus is mainly didactic, functions have pretty longer but more explicit names than usual… for example the Dense layer is a " DenseLayer " , the RBF kernel is " radialKernel " , etc.

That said, Julia is a relatively fast language and most hard job is done in multithreaded functions or using matrix operations whose underlying libraries may be multithreaded, so it is reasonably fast for small exploratory tasks. Also it is already very flexible. For example, one can implement its own layer as a subtype of the abstract type Layer or its own optimisation algorithm as a subtype of OptimisationAlgorithm or even specify its own distance metric in the Kmedoids algorithm…

This repository started from implementing in the Julia language the concepts taught in the MITX 6.86x - Machine Learning with Python: from Linear Models to Deep Learning course, and theoretical notes describing most of these algorithms are available at the companion repository https://github.com/sylvaticus/MITx_6.86x.

Cheers,

Antonello Lobianco, Bureau d’Economie Théorique et Appliquée of Nancy & AgroParisTech

References:

(yep, the logo is inspired by a popular superhero…. the wish is that whenever we have a numerical problem, the Beta Machine Learning toolkit could come to the rescue with its superpowers! :slight_smile: :slight_smile: :slight_smile: )

This is a full example of multi-class classification of the Sepal dataset:

# Load Modules
using BetaML.Nn, DelimitedFiles, Random, StatsPlots # Load the main module and ausiliary modules
Random.seed!(123); # Fix the random seed (to obtain reproducible results)

# Load the data
iris     = readdlm(joinpath(dirname(Base.find_package("BetaML")),"..","test","data","iris.csv"),',',skipstart=1)
iris     = iris[shuffle(axes(iris, 1)), :] # Shuffle the records, as they aren't by default
x        = convert(Array{Float64,2}, iris[:,1:4])
y        = map(x->Dict("setosa" => 1, "versicolor" => 2, "virginica" =>3)[x],iris[:, 5]) # Convert the target column to numbers
y_oh     = oneHotEncoder(y) # Convert to One-hot representation (e.g. 2 => [0 1 0], 3 => [0 0 1])

# Split the data in training/testing sets
ntrain    = Int64(round(size(x,1)*0.8))
xtrain    = x[1:ntrain,:]
ytrain    = y[1:ntrain]
ytrain_oh = y_oh[1:ntrain,:]
xtest     = x[ntrain+1:end,:]
ytest     = y[ntrain+1:end]

# Define the Artificial Neural Network model
l1   = DenseLayer(4,10,f=relu) # Activation function is ReLU
l2   = DenseLayer(10,3)        # Activation function is identity by default
l3   = VectorFunctionLayer(3,3,f=softMax) # Add a (parameterless) layer whose activation function (softMax in this case) is defined to all its nodes at once
mynn = buildNetwork([l1,l2,l3],squaredCost,name="Multinomial logistic regression Model Sepal") # Build the NN and use the squared cost (aka MSE) as error function

# Training it (default to SGD)
res = train!(mynn,scale(xtrain),ytrain_oh,epochs=100,batchSize=6) # Use optAlg=SGD (Stochastic Gradient Descent) by default

# Test it
ŷtrain        = predict(mynn,scale(xtrain))   # Note the scaling function
ŷtest         = predict(mynn,scale(xtest))
trainAccuracy = accuracy(ŷtrain,ytrain,tol=1) # 0.983
testAccuracy  = accuracy(ŷtest,ytest,tol=1)   # 1.0

# Visualise results
testSize = size(ŷtest,1)
ŷtestChosen =  [argmax(ŷtest[i,:]) for i in 1:testSize]
groupedbar([ytest ŷtestChosen], label=["ytest" "ŷtest (est)"], title="True vs estimated categories") # All records correctly labelled !
plot(0:res.epochs,res.ϵ_epochs, ylabel="epochs",xlabel="error",legend=nothing,title="Avg. error per epoch on the Sepal dataset")

image image

PS: thanks to @kevbonham on topic 37198:

It ended up that writing tests, doc, getting CI and registration has been almost as time consuming that writing the library itself, but a very rewarding experience !

19 Likes

That sentence of motivation is now my largest contribution to Julia machine learning :joy:.

Great work!

16 Likes

I see you use ReLU (and you have some other usual suspects). But it’s outdated, and the closest also fast seems to be PLU (feel free to copy my implementation there, and for others):

Mish seems to me the best activation function (and CELU and more I link to there also interesting).

Thank you, I added the celu function to master… I don’t feel the need to add too many activation functions as the user has the ability to choose whatever function she/he wants by just providing the f parameter in the layer constructor…

v0.2 is out.

What’s new:

Clustering: generic mixture support

Added generic, user-specified Mixture support to the EM algorithm, with {Spherical,Diagonal,Full} Gaussian mixtures already implemented.

The support for missing data allows the EM algorithm to be used for missing imputation or collaborative filtering/reccomendation system (using the function predictMissing).

Neural Networks: More default activation functions

Although the user can provide its own activation function (and optionally its derivative to avoid using AD), we included the most recent activation functions (and their derivatives), namely relu, elu, celu, plu, sigmoid, softmax, softplus, mish (thanks to user @Palli).

Utils: Various addition/improvements

We added reverse scaling (in order to scale back the labels/output values), BIC and AIC criteria, meanRelError and the parameter ignoreLabels to the accuracy function in order to account for classification tasks where the label itself doesn’t matter, just its distribution (e.g. in unsupervised learning/clustering).
In master you’ll find also PCA.

The documentation for v0.2 is here.

1 Like

V0.2.2 is out

What’s new (compared to v0.2.0):

PCA Analysys

You can now transform your data using PCA specifying either the number of dimensions you want to keep or the maximum error (variance) you are wiling to accept

kmeans init strategy for em clustering

The expectation-maximisation algorithm for fitting a Generative Mixture Models and cluster data/impute missing data can now be automatically initialised with the output of a kmeans clustering (just pass the parameter initStrategy="kmeans".

ADAM optimisation algorithm for neural networks

In addition to the classical Stochastic Gradient Descent, we added the efficient ADAM, moment based optimiser. The implementation is the same as in the paper where it is introduced, with the difference that the learning rate can be expressed as a (user-provied) function of the epoch rather than being a constant (but we kept as default t -> 0.001 as in the paper).
The solution we chosen proved to be very flexible: adding a optimiser is just a matter of creating a struct that subclass OptimisationAlgorithm and implementing singleUpdate!(θ,▽,optAlg::OptimisationAlgorithm;nEpoch,nBatch,nBatches,xbatch,ybatch) and eventually initOptAlg!(optAlg::OptimisationAlgorithm;θ,batchSize,x,y).