[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package

v0.10.4 is out.

Main stuff compared to 0.10.2:

  • (v0.10.3) general UniversalImputer to impute (with repetitions) missing values using any supervised model (not necessarily from BetaML) that can be wrapped in a m=Model(hp); fit!(m,x,y); yest = predict(m,x) interface (specific imputers, like RFImputer, where already available in the Imputation module)
  • (v0.10.4) simple to use AutoEncoder (and AutoEncoderMLJ) model that follows the API m=AutoEncoder(hp); fit!(m,x); x_latent = predict(m,x); x̂ = inverse_predict(m,x_latent) . Users can optionally specify the number of dimensions to shrink the data (outdims), the number of neurons of the inner layers (innerdims) or the full details of the encoding and decoding layers and all the underlying NN options, but this remains optional.

I had looked a lot in the net, and I believe this is the easiest way to apply a AutoEncoder to reduce the dimensionality of some data, as the user doesn’t really need to deal with the underlying neural network

(note: the release will need a few minutes to go to the Julia public register. The MLJ wrapper model will need I believe manual approval from the MLJ team)

Examples

  • Universalmputer
julia> using BetaML
julia> import DecisionTree
julia> X = [1.4 2.5 "a"; missing 20.5 "b"; 0.6 18 missing; 0.7 22.8 "b"; 0.4 missing "b"; 1.6 3.7 "a"]
6×3 Matrix{Any}:
 1.4        2.5       "a"
  missing  20.5       "b"
 0.6       18         missing
 0.7       22.8       "b"
 0.4         missing  "b"
 1.6        3.7       "a"
julia> mod = UniversalImputer(estimator=[DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeClassifier()], fit_function = DecisionTree.fit!, predict_function=DecisionTree.predict, recursive_passages=2)
UniversalImputer - A imputer based on an arbitrary regressor/classifier(unfitted)
julia> X_full = fit!(mod,X)
** Processing imputation 1
6×3 Matrix{Any}:
 1.4    2.5  "a"
 0.94  20.5  "b"
 0.6   18    "b"
 0.7   22.8  "b"
 0.4   13.5  "b"
 1.6    3.7  "a"
  • AutoEncoder:
julia> using BetaML

julia> x = [0.12 0.31 0.29 3.21 0.21;
            0.22 0.61 0.58 6.43 0.42;
            0.51 1.47 1.46 16.12 0.99;
            0.35 0.93 0.91 10.04 0.71;
            0.44 1.21 1.18 13.54 0.85];

julia> m    = AutoEncoder(outdims=1,epochs=400)
A AutoEncoder BetaMLModel (unfitted)

julia> x_reduced = fit!(m,x)
***
*** Training  for 400 epochs with algorithm ADAM.
Training..       avg loss on epoch 1 (1):        60.27802763757111
Training..       avg loss on epoch 200 (200):    0.08970099870421573
Training..       avg loss on epoch 400 (400):    0.013138484118673664
Training of 400 epoch completed. Final epoch error: 0.013138484118673664.
5×1 Matrix{Float64}:
  -3.5483740608901186
  -6.90396890458868
 -17.06296512222304
 -10.688936344498398
 -14.35734756603212

julia> x̂ = inverse_predict(m,x_reduced)
5×5 Matrix{Float64}:
 0.0982406  0.110294  0.264047   3.35501  0.327228
 0.205628   0.470884  0.558655   6.51042  0.487416
 0.529785   1.56431   1.45762   16.067    0.971123
 0.3264     0.878264  0.893584  10.0709   0.667632
 0.443453   1.2731    1.2182    13.5218   0.842298

julia> info(m)["rme"]
0.020858783340281222

julia> hcat(x,x̂)
5×10 Matrix{Float64}:
 0.12  0.31  0.29   3.21  0.21  0.0982406  0.110294  0.264047   3.35501  0.327228
 0.22  0.61  0.58   6.43  0.42  0.205628   0.470884  0.558655   6.51042  0.487416
 0.51  1.47  1.46  16.12  0.99  0.529785   1.56431   1.45762   16.067    0.971123
 0.35  0.93  0.91  10.04  0.71  0.3264     0.878264  0.893584  10.0709   0.667632
 0.44  1.21  1.18  13.54  0.85  0.443453   1.2731    1.2182    13.5218   0.842298
9 Likes