[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package

BetaML_logo microExample_white

Dear all,

I would like to announce the availability of “BetaML” , the Beta Machine Learning toolkit, a package for Machine Learning algorithms and related utilities.

The toolkit is currently made of 4 modules. Perceptron includes the classical perceptron linear classifier, but also the non-linear kernel perceptron and the gradient-based Pegasus classifier. Nn implements easy-to-model Artificial Neural Networks (simple feed-forward only for the moment, but we plan to add support for convolutional layers, Recurrent Neural Network and LSTM ones). Note that automatic differentiation with Zygote is optional, you can pass your own derivative of the activation function if you wish (common ones are provided). Clustering has algorithms such as kmeans, Kmedoids and Expectation-Maximisation based on Gaussian Mixture Models (GMM). As the EM algorithm supports partially missing observations (observations with missing data only on some dimensions), it is used as backbone algorithm for collaborative filtering (recommendation systems). Finally Utils is a module implementing common functions as scaling, one-hot encoding, various kernels and distance metrics.

BetaML most likely has value only didactically, as the approaches are the “vanilla” ones, i.e. the simplest possible ones, and GPU is not supported. For “serious” machine learning work in Julia I would suggest to use either Flux or Knet.

As the focus is mainly didactic, functions have pretty longer but more explicit names than usual… for example the Dense layer is a " DenseLayer " , the RBF kernel is " radialKernel " , etc.

That said, Julia is a relatively fast language and most hard job is done in multithreaded functions or using matrix operations whose underlying libraries may be multithreaded, so it is reasonably fast for small exploratory tasks. Also it is already very flexible. For example, one can implement its own layer as a subtype of the abstract type Layer or its own optimisation algorithm as a subtype of OptimisationAlgorithm or even specify its own distance metric in the Kmedoids algorithm…

This repository started from implementing in the Julia language the concepts taught in the MITX 6.86x - Machine Learning with Python: from Linear Models to Deep Learning course, and theoretical notes describing most of these algorithms are available at the companion repository GitHub - sylvaticus/MITx_6.86x: Notes of MITx 6.86x - Machine Learning with Python: from Linear Models to Deep Learning.

Cheers,

Antonello Lobianco, Bureau d’Economie Théorique et Appliquée of Nancy & AgroParisTech

References:

(yep, the logo is inspired by a popular superhero…. the wish is that whenever we have a numerical problem, the Beta Machine Learning toolkit could come to the rescue with its superpowers! :slight_smile: :slight_smile: :slight_smile: )

This is a full example of multi-class classification of the Sepal dataset:

# Load Modules
using BetaML.Nn, DelimitedFiles, Random, StatsPlots # Load the main module and ausiliary modules
Random.seed!(123); # Fix the random seed (to obtain reproducible results)

# Load the data
iris     = readdlm(joinpath(dirname(Base.find_package("BetaML")),"..","test","data","iris.csv"),',',skipstart=1)
iris     = iris[shuffle(axes(iris, 1)), :] # Shuffle the records, as they aren't by default
x        = convert(Array{Float64,2}, iris[:,1:4])
y        = map(x->Dict("setosa" => 1, "versicolor" => 2, "virginica" =>3)[x],iris[:, 5]) # Convert the target column to numbers
y_oh     = oneHotEncoder(y) # Convert to One-hot representation (e.g. 2 => [0 1 0], 3 => [0 0 1])

# Split the data in training/testing sets
ntrain    = Int64(round(size(x,1)*0.8))
xtrain    = x[1:ntrain,:]
ytrain    = y[1:ntrain]
ytrain_oh = y_oh[1:ntrain,:]
xtest     = x[ntrain+1:end,:]
ytest     = y[ntrain+1:end]

# Define the Artificial Neural Network model
l1   = DenseLayer(4,10,f=relu) # Activation function is ReLU
l2   = DenseLayer(10,3)        # Activation function is identity by default
l3   = VectorFunctionLayer(3,3,f=softMax) # Add a (parameterless) layer whose activation function (softMax in this case) is defined to all its nodes at once
mynn = buildNetwork([l1,l2,l3],squaredCost,name="Multinomial logistic regression Model Sepal") # Build the NN and use the squared cost (aka MSE) as error function

# Training it (default to SGD)
res = train!(mynn,scale(xtrain),ytrain_oh,epochs=100,batchSize=6) # Use optAlg=SGD (Stochastic Gradient Descent) by default

# Test it
ŷtrain        = predict(mynn,scale(xtrain))   # Note the scaling function
ŷtest         = predict(mynn,scale(xtest))
trainAccuracy = accuracy(ŷtrain,ytrain,tol=1) # 0.983
testAccuracy  = accuracy(ŷtest,ytest,tol=1)   # 1.0

# Visualise results
testSize = size(ŷtest,1)
ŷtestChosen =  [argmax(ŷtest[i,:]) for i in 1:testSize]
groupedbar([ytest ŷtestChosen], label=["ytest" "ŷtest (est)"], title="True vs estimated categories") # All records correctly labelled !
plot(0:res.epochs,res.ϵ_epochs, ylabel="epochs",xlabel="error",legend=nothing,title="Avg. error per epoch on the Sepal dataset")

image image

PS: thanks to @kevbonham on topic 37198:

It ended up that writing tests, doc, getting CI and registration has been almost as time consuming that writing the library itself, but a very rewarding experience !

31 Likes

That sentence of motivation is now my largest contribution to Julia machine learning :joy:.

Great work!

28 Likes

I see you use ReLU (and you have some other usual suspects). But it’s outdated, and the closest also fast seems to be PLU (feel free to copy my implementation there, and for others):

https://github.com/onnx/onnx/pull/2575#issuecomment-645007473

Mish seems to me the best activation function (and CELU and more I link to there also interesting).

Thank you, I added the celu function to master… I don’t feel the need to add too many activation functions as the user has the ability to choose whatever function she/he wants by just providing the f parameter in the layer constructor…

v0.2 is out.

What’s new:

Clustering: generic mixture support

Added generic, user-specified Mixture support to the EM algorithm, with {Spherical,Diagonal,Full} Gaussian mixtures already implemented.

The support for missing data allows the EM algorithm to be used for missing imputation or collaborative filtering/reccomendation system (using the function predictMissing).

Neural Networks: More default activation functions

Although the user can provide its own activation function (and optionally its derivative to avoid using AD), we included the most recent activation functions (and their derivatives), namely relu, elu, celu, plu, sigmoid, softmax, softplus, mish (thanks to user @Palli).

Utils: Various addition/improvements

We added reverse scaling (in order to scale back the labels/output values), BIC and AIC criteria, meanRelError and the parameter ignoreLabels to the accuracy function in order to account for classification tasks where the label itself doesn’t matter, just its distribution (e.g. in unsupervised learning/clustering).
In master you’ll find also PCA.

The documentation for v0.2 is here.

2 Likes

V0.2.2 is out

What’s new (compared to v0.2.0):

PCA Analysys

You can now transform your data using PCA specifying either the number of dimensions you want to keep or the maximum error (variance) you are wiling to accept

kmeans init strategy for em clustering

The expectation-maximisation algorithm for fitting a Generative Mixture Models and cluster data/impute missing data can now be automatically initialised with the output of a kmeans clustering (just pass the parameter initStrategy="kmeans".

ADAM optimisation algorithm for neural networks

In addition to the classical Stochastic Gradient Descent, we added the efficient ADAM, moment based optimiser. The implementation is the same as in the paper where it is introduced, with the difference that the learning rate can be expressed as a (user-provied) function of the epoch rather than being a constant (but we kept as default t → 0.001 as in the paper).
The solution we chosen proved to be very flexible: adding a optimiser is just a matter of creating a struct that subclass OptimisationAlgorithm and implementing singleUpdate!(θ,▽,optAlg::OptimisationAlgorithm;nEpoch,nBatch,nBatches,xbatch,ybatch) and eventually initOptAlg!(optAlg::OptimisationAlgorithm;θ,batchSize,x,y).

3 Likes

v0.3 is out

What’s new in v0.3 (compared to 0.2.2):

  • Decision Trees / Random Forest (BetaML.Trees)

    • Added Decision Trees and Random Forests algorithms for classification and regression tasks, with support of missing values and the following “impurity” measures: gini, entropy, variance
  • Neural Networks (BetaML.NN)

    • Implemented Xavier initialisation as default weigth initialisation
  • Utilitis (BetaML.Utils)

    • Added explVarByDim in the pca output so one can choose the number of dimensions to use running pca only once
    • Added crossEntropy and dCrossEntropy (to use as NN loss functions)
    • Added the “impurity measures” functions giniImpurity and entropy
    • Added the “utility” functions classCounts and meanDicts
  • Documentation

    • Added regression example on bike sharing demand using both Neural Networks and Decision Trees / Random Forest
5 Likes

Bdw, if someone would like to review the corresponding Open Source Software paper… The review is stock in the “pre-review” status, as the editors can’t find a reviewer…

https://github.com/openjournals/joss-reviews/issues/2512

https://github.com/xiaodaigh/awesome-ml-frameworks

I have added BetaML.jl to the above list.

2 Likes

BetaML v0.4.0 is out

What’s new in v0.4 (compared to 0.3):

  • Decision Trees / Random Forests (BetaML.Trees)

    • Added support for fully categorical features (i.e. non even sortable ones) to trees models. All Trees models accept now almost any kind of possible type as feature: continuous, categorical, ordinal, missing data…
    • Added oobEstimation to Random Trees and support for trees weights on Random Forests models
  • Perceptron-like models (BetaML.Perceptron)

    • perceptron, kernelPerceptron and pegasos can now perfom multiclass classification and report their otputs as “probabilities” (or better, “normalised scores”) for each class. Use their [name]Binary version for binary classification on {-1,+1} labels, and/or mode(y) to retrieve a single class prediction per each record.
  • Utilitis (BetaML.Utils)

    • Added issortable(array) to check if an array is sortable, i.e. has methos issort defined"“”
    • Added partition() to partition (by rows) one or more matrices according to the predetermined shares, e.g. ((xtrain,xtest),(ytrain,ytest)) = partition([x,y],[0.7,0.3])
    • Added colsWithMissings to check which columns in a matrix have indeed missing values
    • Expanded error() and accuracy() to work with any T categorical value, not just Int64
  • Clustering (Beta.Clustering)

    • Renamed the em algorithm to gmm
  • MLJ API

    • Experimental initial integration with the MLJ API. For the time being the following models have been made available to the MLJ framework : PerceptronClassifier, KernelPerceptronClassifier, PegasosClassifier, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor.
  • Other

    • Moved Continuous Integration to GitHub actions
    • Rename all rShuffle and sequential parameters in the various algorithms to shuffle
    • New package dependencies: CategoricalArrays and MLJModelInterface
    • Several bugfixes, optimisations and updated dependencies (see the commit log for details)
    • Updated documentation
    • Added option to run partial testing, eg: `using Pkg; Pkg.test(“BetaML”, test_args=[“Trees”,“Clustering”,“all”])
4 Likes

BetaML v0.5.0 is out

What’s new in v0.5 (compared to 0.4.1):

  • Documentation

    • Extensive step-by-step tutorial to BetaML algorithms (and in a certain sense to ML and Julia in general), with comparisons with Clustering.jl, GaussianMixtures.jl, Flux.jl and DecisionTree.jl packages;
    • Added option to “preview” the documentation without running the code in the tutorial (push!(ARGS,"preview"); include("make.jl"))
  • MLJ API

    • Integration with the MLJ API. The following models have been made available to the MLJ framework : PerceptronClassifier, KernelPerceptronClassifier, PegasosClassifier, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor, KMeans, KMedoids, GMMClusterer, MissingImputator
  • Package reorganisation

    • All the functionality of the different sub-modules is now re-exported at the root level, so the user needs just to using BetaML to access it
    • The Utils module has been split in different files
  • Stochasticity management

    • Added the parameter rng to all stochastic models to allow fine-tuning of the stochasticity/replicability trade-off
    • Added function generateParallelRngs to allow repeteable results indipendently from the number of thread used
    • Extended Random.shuffle function to allow multiple matrices and specify the dimension over which to shuffle
  • Utilities (BetaML.Utils)

    • Added dims and copy parameters to partition
    • Added crossValidation, with a user defined function/do block and configurable sampler (SamplerWithData{T <: AbstractDataSampler})
    • Added ConfusionMatrix
    • Added the pool1d activation function
  • Other

    • Improved the grid initialisation for clusters
    • Updated the JOSS paper
    • New package dependencies: StableRNGs and ForceImport
    • Several bugfixes, optimisations and updated dependencies (see the commit log for details)
10 Likes

BetaML v0.10.2 is out

New stuff include the GroupedLayer and ReplicatorLayer that allow to model multi-branches deep neural networks, like in the following architecture that is discussed in this tutorial:

5 Likes

v0.10.4 is out.

Main stuff compared to 0.10.2:

  • (v0.10.3) general UniversalImputer to impute (with repetitions) missing values using any supervised model (not necessarily from BetaML) that can be wrapped in a m=Model(hp); fit!(m,x,y); yest = predict(m,x) interface (specific imputers, like RFImputer, where already available in the Imputation module)
  • (v0.10.4) simple to use AutoEncoder (and AutoEncoderMLJ) model that follows the API m=AutoEncoder(hp); fit!(m,x); x_latent = predict(m,x); x̂ = inverse_predict(m,x_latent) . Users can optionally specify the number of dimensions to shrink the data (outdims), the number of neurons of the inner layers (innerdims) or the full details of the encoding and decoding layers and all the underlying NN options, but this remains optional.

I had looked a lot in the net, and I believe this is the easiest way to apply a AutoEncoder to reduce the dimensionality of some data, as the user doesn’t really need to deal with the underlying neural network

(note: the release will need a few minutes to go to the Julia public register. The MLJ wrapper model will need I believe manual approval from the MLJ team)

Examples

  • Universalmputer
julia> using BetaML
julia> import DecisionTree
julia> X = [1.4 2.5 "a"; missing 20.5 "b"; 0.6 18 missing; 0.7 22.8 "b"; 0.4 missing "b"; 1.6 3.7 "a"]
6×3 Matrix{Any}:
 1.4        2.5       "a"
  missing  20.5       "b"
 0.6       18         missing
 0.7       22.8       "b"
 0.4         missing  "b"
 1.6        3.7       "a"
julia> mod = UniversalImputer(estimator=[DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeClassifier()], fit_function = DecisionTree.fit!, predict_function=DecisionTree.predict, recursive_passages=2)
UniversalImputer - A imputer based on an arbitrary regressor/classifier(unfitted)
julia> X_full = fit!(mod,X)
** Processing imputation 1
6×3 Matrix{Any}:
 1.4    2.5  "a"
 0.94  20.5  "b"
 0.6   18    "b"
 0.7   22.8  "b"
 0.4   13.5  "b"
 1.6    3.7  "a"
  • AutoEncoder:
julia> using BetaML

julia> x = [0.12 0.31 0.29 3.21 0.21;
            0.22 0.61 0.58 6.43 0.42;
            0.51 1.47 1.46 16.12 0.99;
            0.35 0.93 0.91 10.04 0.71;
            0.44 1.21 1.18 13.54 0.85];

julia> m    = AutoEncoder(outdims=1,epochs=400)
A AutoEncoder BetaMLModel (unfitted)

julia> x_reduced = fit!(m,x)
***
*** Training  for 400 epochs with algorithm ADAM.
Training..       avg loss on epoch 1 (1):        60.27802763757111
Training..       avg loss on epoch 200 (200):    0.08970099870421573
Training..       avg loss on epoch 400 (400):    0.013138484118673664
Training of 400 epoch completed. Final epoch error: 0.013138484118673664.
5×1 Matrix{Float64}:
  -3.5483740608901186
  -6.90396890458868
 -17.06296512222304
 -10.688936344498398
 -14.35734756603212

julia> x̂ = inverse_predict(m,x_reduced)
5×5 Matrix{Float64}:
 0.0982406  0.110294  0.264047   3.35501  0.327228
 0.205628   0.470884  0.558655   6.51042  0.487416
 0.529785   1.56431   1.45762   16.067    0.971123
 0.3264     0.878264  0.893584  10.0709   0.667632
 0.443453   1.2731    1.2182    13.5218   0.842298

julia> info(m)["rme"]
0.020858783340281222

julia> hcat(x,x̂)
5×10 Matrix{Float64}:
 0.12  0.31  0.29   3.21  0.21  0.0982406  0.110294  0.264047   3.35501  0.327228
 0.22  0.61  0.58   6.43  0.42  0.205628   0.470884  0.558655   6.51042  0.487416
 0.51  1.47  1.46  16.12  0.99  0.529785   1.56431   1.45762   16.067    0.971123
 0.35  0.93  0.91  10.04  0.71  0.3264     0.878264  0.893584  10.0709   0.667632
 0.44  1.21  1.18  13.54  0.85  0.443453   1.2731    1.2182    13.5218   0.842298
9 Likes

BetaML v0.11 is out

Release notes:

Attention: many breaking changes in this version !!

  • experimental new ConvLayer and PoolLayer for convolutional networks. BetaML neural networks work only on CPU and even on CPU the convolutional layers (but not the dense ones) are 2-3 times slower than Flux. Still they have some quite unique characteristics, like working with any dimensions or not requiring AD in most cases, so they may still be useful in some corner situations. Then, if you want to help in porting to GPU… :wink: (I just got my first machine with usable GPU)
  • Isolated MLJ interface models into their own Bmlj submodule
  • Renamed many model in a congruent way
  • Shortened the hyper-parameters and learnable parameters struct names
  • Corrected many doc bugs
  • Several bugfixes
3 Likes

Just curious, why don’t you use basic functionality such as conv layers or activations from NNlib.jl?

I think the first answer is because it is fun :wink:

I can also easily add what I am interested on, relying on what I have already wrote. For example, once the EM algorithm is there, I could write a regressor, not only a clusterer, with the RF I can implement an Imputer based on it, with NN I can implement an AutoEncoder…

While I agree that having many independent but interconnected specialised packages
is often preferable, I think exploring the other route of having everything in once maneageable package could also be productive. Funny, it is :slight_smile:

2 Likes