[ANN] MLJ: Outlier Detection, Text Analysis, Improved Pipelines and Serialization

There have been quite a few releases (two breaking) since our last announcement. Here are some highlights, available in MLJ 0.18.

Tutorial: MLJ for Data Scientists in Two Hours

We have added a new tutorial focused on the practicing data scientist transitioning from another platform, such as scikit-learn or caret.

Pipelines without macros

For improved robustness, pipelines are no longer created with a macro. Every pipeline is just an instance of a single parameterized type Pipeline, with a constructor of the same name. However, unless you have special options to specify, you just combine models using the |> syntax:

using MLJ

LinearRegressor = @load LinearRegressor pkg=MLJLinearModels add=true
PCA = @load PCA pkg=MultivariateStats add=true
pipe1 = MLJBase.table |> ContinuousEncoder |> Standardizer
pipe2 = PCA |> LinearRegressor
pipe3 = pipe1 |> pipe2

Target transformers

Transformation of the target (with post-prediction inversion) is no longer available in pipelines, but provided instead by the TransformedTarget(model, ...) model wrapper.

Serialization (Olivier Labayle @olivierlabayle)

Serialization in MLJ has changed. The previous MLJ.save(mach) method still works, but you can only save using Julia’s native JLS format in this way (and the new format is not backwards compatible). However, a new workflow allows for serialization using any generic serializer; serialization plays nicely with model composition and model wrappers, such as TunedModel and EnsembleModel (even for non-Julia atomic models); and training data will not be inadvertently serialized.

Outlier detection (David Muhr @davnn)

MLJ now wraps a large number of outlier detection models from OutlierDetection.jl. In MLJ, do models("Detector") to list these. See here for usage.

Text analysis (Chris Alexander @pazzo83)

MLJ now provides some text analysis tools. See the MLJText.jl readme for details. Also new is the TSVDTransformer model for truncated singular value decomposition, an interface to TSVD.jl.