[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package

sylvaticus · December 29, 2023, 3:17pm

v0.10.4 is out.

Main stuff compared to 0.10.2:

(v0.10.3) general UniversalImputer to impute (with repetitions) missing values using any supervised model (not necessarily from BetaML) that can be wrapped in a m=Model(hp); fit!(m,x,y); yest = predict(m,x) interface (specific imputers, like RFImputer, where already available in the Imputation module)
(v0.10.4) simple to use AutoEncoder (and AutoEncoderMLJ) model that follows the API m=AutoEncoder(hp); fit!(m,x); x_latent = predict(m,x); x̂ = inverse_predict(m,x_latent) . Users can optionally specify the number of dimensions to shrink the data (outdims), the number of neurons of the inner layers (innerdims) or the full details of the encoding and decoding layers and all the underlying NN options, but this remains optional.

I had looked a lot in the net, and I believe this is the easiest way to apply a AutoEncoder to reduce the dimensionality of some data, as the user doesn’t really need to deal with the underlying neural network

(note: the release will need a few minutes to go to the Julia public register. The MLJ wrapper model will need I believe manual approval from the MLJ team)

Examples

Universalmputer

julia> using BetaML
julia> import DecisionTree
julia> X = [1.4 2.5 "a"; missing 20.5 "b"; 0.6 18 missing; 0.7 22.8 "b"; 0.4 missing "b"; 1.6 3.7 "a"]
6×3 Matrix{Any}:
 1.4        2.5       "a"
  missing  20.5       "b"
 0.6       18         missing
 0.7       22.8       "b"
 0.4         missing  "b"
 1.6        3.7       "a"
julia> mod = UniversalImputer(estimator=[DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeClassifier()], fit_function = DecisionTree.fit!, predict_function=DecisionTree.predict, recursive_passages=2)
UniversalImputer - A imputer based on an arbitrary regressor/classifier(unfitted)
julia> X_full = fit!(mod,X)
** Processing imputation 1
6×3 Matrix{Any}:
 1.4    2.5  "a"
 0.94  20.5  "b"
 0.6   18    "b"
 0.7   22.8  "b"
 0.4   13.5  "b"
 1.6    3.7  "a"

AutoEncoder:

julia> using BetaML

julia> x = [0.12 0.31 0.29 3.21 0.21;
            0.22 0.61 0.58 6.43 0.42;
            0.51 1.47 1.46 16.12 0.99;
            0.35 0.93 0.91 10.04 0.71;
            0.44 1.21 1.18 13.54 0.85];

julia> m    = AutoEncoder(outdims=1,epochs=400)
A AutoEncoder BetaMLModel (unfitted)

julia> x_reduced = fit!(m,x)
***
*** Training  for 400 epochs with algorithm ADAM.
Training..       avg loss on epoch 1 (1):        60.27802763757111
Training..       avg loss on epoch 200 (200):    0.08970099870421573
Training..       avg loss on epoch 400 (400):    0.013138484118673664
Training of 400 epoch completed. Final epoch error: 0.013138484118673664.
5×1 Matrix{Float64}:
  -3.5483740608901186
  -6.90396890458868
 -17.06296512222304
 -10.688936344498398
 -14.35734756603212

julia> x̂ = inverse_predict(m,x_reduced)
5×5 Matrix{Float64}:
 0.0982406  0.110294  0.264047   3.35501  0.327228
 0.205628   0.470884  0.558655   6.51042  0.487416
 0.529785   1.56431   1.45762   16.067    0.971123
 0.3264     0.878264  0.893584  10.0709   0.667632
 0.443453   1.2731    1.2182    13.5218   0.842298

julia> info(m)["rme"]
0.020858783340281222

julia> hcat(x,x̂)
5×10 Matrix{Float64}:
 0.12  0.31  0.29   3.21  0.21  0.0982406  0.110294  0.264047   3.35501  0.327228
 0.22  0.61  0.58   6.43  0.42  0.205628   0.470884  0.558655   6.51042  0.487416
 0.51  1.47  1.46  16.12  0.99  0.529785   1.56431   1.45762   16.067    0.971123
 0.35  0.93  0.91  10.04  0.71  0.3264     0.878264  0.893584  10.0709   0.667632
 0.44  1.21  1.18  13.54  0.85  0.443453   1.2731    1.2182    13.5218   0.842298

Topic		Replies	Views
[ANN] BetaML v 0.7 New Missing values imputers and "standardised" fit!/predict API Package Announcements	0	256	August 2, 2022
[ANN] BetaML v0.8: Model defininition, hyperparameters tuning and fitting in 2 lines Machine Learning package , announcement , machine-learning	4	500	October 3, 2022
My own Feedforward Neural Network library :-) Machine Learning	28	4252	June 15, 2020
Missing imputation: comparision of BetaML, Python SKL, R Mice Data benchmark , missing-values	1	686	October 5, 2022
Custom XGBoost Loss function w/ Zygote. Julia Computing blog post Machine Learning zygote , kaggle	36	4926	April 29, 2020

[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package

Related topics