v0.10.4 is out.
Main stuff compared to 0.10.2:
- (v0.10.3) general
UniversalImputer
to impute (with repetitions) missing values using any supervised model (not necessarily from BetaML) that can be wrapped in am=Model(hp); fit!(m,x,y); yest = predict(m,x)
interface (specific imputers, likeRFImputer
, where already available in theImputation
module) - (v0.10.4) simple to use
AutoEncoder
(andAutoEncoderMLJ
) model that follows the APIm=AutoEncoder(hp); fit!(m,x); x_latent = predict(m,x); x̂ = inverse_predict(m,x_latent)
. Users can optionally specify the number of dimensions to shrink the data (outdims
), the number of neurons of the inner layers (innerdims
) or the full details of the encoding and decoding layers and all the underlying NN options, but this remains optional.
I had looked a lot in the net, and I believe this is the easiest way to apply a AutoEncoder to reduce the dimensionality of some data, as the user doesn’t really need to deal with the underlying neural network
(note: the release will need a few minutes to go to the Julia public register. The MLJ wrapper model will need I believe manual approval from the MLJ team)
Examples
- Universalmputer
julia> using BetaML
julia> import DecisionTree
julia> X = [1.4 2.5 "a"; missing 20.5 "b"; 0.6 18 missing; 0.7 22.8 "b"; 0.4 missing "b"; 1.6 3.7 "a"]
6×3 Matrix{Any}:
1.4 2.5 "a"
missing 20.5 "b"
0.6 18 missing
0.7 22.8 "b"
0.4 missing "b"
1.6 3.7 "a"
julia> mod = UniversalImputer(estimator=[DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeRegressor(),DecisionTree.DecisionTreeClassifier()], fit_function = DecisionTree.fit!, predict_function=DecisionTree.predict, recursive_passages=2)
UniversalImputer - A imputer based on an arbitrary regressor/classifier(unfitted)
julia> X_full = fit!(mod,X)
** Processing imputation 1
6×3 Matrix{Any}:
1.4 2.5 "a"
0.94 20.5 "b"
0.6 18 "b"
0.7 22.8 "b"
0.4 13.5 "b"
1.6 3.7 "a"
- AutoEncoder:
julia> using BetaML
julia> x = [0.12 0.31 0.29 3.21 0.21;
0.22 0.61 0.58 6.43 0.42;
0.51 1.47 1.46 16.12 0.99;
0.35 0.93 0.91 10.04 0.71;
0.44 1.21 1.18 13.54 0.85];
julia> m = AutoEncoder(outdims=1,epochs=400)
A AutoEncoder BetaMLModel (unfitted)
julia> x_reduced = fit!(m,x)
***
*** Training for 400 epochs with algorithm ADAM.
Training.. avg loss on epoch 1 (1): 60.27802763757111
Training.. avg loss on epoch 200 (200): 0.08970099870421573
Training.. avg loss on epoch 400 (400): 0.013138484118673664
Training of 400 epoch completed. Final epoch error: 0.013138484118673664.
5×1 Matrix{Float64}:
-3.5483740608901186
-6.90396890458868
-17.06296512222304
-10.688936344498398
-14.35734756603212
julia> x̂ = inverse_predict(m,x_reduced)
5×5 Matrix{Float64}:
0.0982406 0.110294 0.264047 3.35501 0.327228
0.205628 0.470884 0.558655 6.51042 0.487416
0.529785 1.56431 1.45762 16.067 0.971123
0.3264 0.878264 0.893584 10.0709 0.667632
0.443453 1.2731 1.2182 13.5218 0.842298
julia> info(m)["rme"]
0.020858783340281222
julia> hcat(x,x̂)
5×10 Matrix{Float64}:
0.12 0.31 0.29 3.21 0.21 0.0982406 0.110294 0.264047 3.35501 0.327228
0.22 0.61 0.58 6.43 0.42 0.205628 0.470884 0.558655 6.51042 0.487416
0.51 1.47 1.46 16.12 0.99 0.529785 1.56431 1.45762 16.067 0.971123
0.35 0.93 0.91 10.04 0.71 0.3264 0.878264 0.893584 10.0709 0.667632
0.44 1.21 1.18 13.54 0.85 0.443453 1.2731 1.2182 13.5218 0.842298