Transform several columns of an MLJ model using one transformer

Is there an easy way to transform several columns of a table using the same MLJ transformer? So if I have a transformer like UnivariateBoxCoxTransformer, I’d like to apply it only to columns 2, 3, and 4.

Alternatively, some way to avoid applying a transformer (like Standardizer) to some columns.

If you can’t find a solution with MLJ.jl take a look at TableTransforms.jl.

1 Like

Thanks, but unfortunately, I’m looking for an answer to make a PR to MLJ.jl, and I’d rather not add a dependency :sweat_smile:

I’ve long thought we should have this, but we don’t. An issue was opened some time ago:

1 Like

Please! It is very cumbersome, like this

# 1. loading  package
using  DataFrames,TableTransforms
using  Random

# 2. loading  data -> dataframe

# 3. tranformation

[1,3,5,6,8,9,10,14] better be one line

1 Like

Damn. So TIL about Functional, which is a huge quality-of-life improvement already. Actually, as I’m reading more about TableTransforms.jl, I’m thinking this is a great consolidation opportunity; most of the built-in transformations in MLJModels.jl might fit better in TableTransforms.

TableTransforms.jl has very flexible column selection features:

Functional([1,3,5,6,8,9,10,14] => log)

You can use lists of symbols, strings, integers, regex, …

1 Like

Ooh, neat! Perhaps this could be more clearly documented?

All transforms should have a clear docstring explaining these options.

this method doesn’t work right now

You need to be more explicit about which method doesn’t work. Can you please share a MWE?

Is there a way to use TableTransforms.jl together with MLJ transforms?

Not currently. The main issues are:

  1. (abstract type roadblock) MLJModelInterface requires new algorithms to subtype an abstract type owned by MLJModelInterface (Unsupervised or Static) but TableTransforms.jl, as I understand it, is trying for a pure functional interface, and without depending on externally owned types.

  2. (limitation on functionality) The MLJTuning.jl API for tuning models is based on mutation of the hyperparameter struct, and so not suited to TableTransforms.jl transformer structs, which are immutable. This currently rules out optimization of transformer hyperparameters in MLJ pipelines.

One day MLJ may rid itself of its abstract model type hierarchy (for efforts in this direction, see this announcement). However, it is substantially embedded in the ecosystem and unlikely to disappear in the near future.

A simple, but unattractive, solution to 1. would be for MLJModels.jl or TableTransforms.jl to provide a wrapper. The only way I can think of to avoid the wrapper in the status quo would require metaprogramming hacks that would likely be brittle.

1 Like

Sounds good to me!