[ANN] MLJ: Outlier Detection, Text Analysis, Improved Pipelines and Serialization

ablaom · April 11, 2022, 1:40am

There have been quite a few releases (two breaking) since our last announcement. Here are some highlights, available in MLJ 0.18.

Tutorial: MLJ for Data Scientists in Two Hours

We have added a new tutorial focused on the practicing data scientist transitioning from another platform, such as scikit-learn or caret.

Pipelines without macros

For improved robustness, pipelines are no longer created with a macro. Every pipeline is just an instance of a single parameterized type Pipeline, with a constructor of the same name. However, unless you have special options to specify, you just combine models using the |> syntax:

using MLJ

LinearRegressor = @load LinearRegressor pkg=MLJLinearModels add=true
PCA = @load PCA pkg=MultivariateStats add=true
  
pipe1 = MLJBase.table |> ContinuousEncoder |> Standardizer
pipe2 = PCA |> LinearRegressor
pipe3 = pipe1 |> pipe2

Target transformers

Transformation of the target (with post-prediction inversion) is no longer available in pipelines, but provided instead by the TransformedTarget(model, ...) model wrapper.

Serialization (Olivier Labayle @olivierlabayle)

Serialization in MLJ has changed. The previous MLJ.save(mach) method still works, but you can only save using Julia’s native JLS format in this way (and the new format is not backwards compatible). However, a new workflow allows for serialization using any generic serializer; serialization plays nicely with model composition and model wrappers, such as TunedModel and EnsembleModel (even for non-Julia atomic models); and training data will not be inadvertently serialized.

Outlier detection (David Muhr @davnn)

MLJ now wraps a large number of outlier detection models from OutlierDetection.jl. In MLJ, do models("Detector") to list these. See here for usage.

Text analysis (Chris Alexander @pazzo83)

MLJ now provides some text analysis tools. See the MLJText.jl readme for details. Also new is the TSVDTransformer model for truncated singular value decomposition, an interface to TSVD.jl.

Topic		Replies	Views
[ANN] OutlierDetection.jl - Outlier / Anomaly Detection Ecosystem Package Announcements package	3	1318	March 5, 2022
MLJ.save() and restoring with machine() don't work Machine Learning question , mlj	11	358	March 7, 2024
[ANN] MLJ: an update Machine Learning	7	1277	December 1, 2019
Saving models in MLJ - only final ones without data Machine Learning question , mlj	3	503	September 9, 2022
JuliaML organization and MLJ.jl Machine Learning	5	1470	August 19, 2019