Is it possible to apply some pipeline whether it'd be MLJ, Scikitlearn or FeatureTransform by group?

I have asked about this on about scikitlearn. But I can’t find a good solution.

E.g. if I want to perform pca or standardscaler, I might want to do it within each group?

But there is no easy way to do that in a pipeline, like one that remembers what groups were there etc.

I can do it like this using the . broadcast, but I feel like it’s missing something. Feels there should be a GroupBy operator in the pipeline.

Like @pipeline GroupBy(StandardScaler(), PCA(), groupby = [:grp1, :grp2])

but this is the best I came up with so far.

using MLJ: Standardizer
import MLJ

nest(df, by) = combine(groupby(df, by), sdf->[sdf])
gdf = sort!(nest(phone_info_df5, :year), :year)
l = nrow(gdf)
stds = [Standardizer() for _ in 1:l]
machs = MLJ.machine.(stds, gdf.x1)
MLJ.fit!.(machs)
MLJ.transform.(machs, gdf.x1)

1 Like