Is it possible to apply some pipeline whether it'd be MLJ, Scikitlearn or FeatureTransform by group?

xiaodai · July 13, 2021, 4:23am

I have asked about this on about scikitlearn. But I can’t find a good solution.

E.g. if I want to perform pca or standardscaler, I might want to do it within each group?

But there is no easy way to do that in a pipeline, like one that remembers what groups were there etc.

I can do it like this using the . broadcast, but I feel like it’s missing something. Feels there should be a GroupBy operator in the pipeline.

Like @pipeline GroupBy(StandardScaler(), PCA(), groupby = [:grp1, :grp2])

but this is the best I came up with so far.

using MLJ: Standardizer
import MLJ

nest(df, by) = combine(groupby(df, by), sdf->[sdf])
gdf = sort!(nest(phone_info_df5, :year), :year)
l = nrow(gdf)
stds = [Standardizer() for _ in 1:l]
machs = MLJ.machine.(stds, gdf.x1)
MLJ.fit!.(machs)
MLJ.transform.(machs, gdf.x1)

Topic		Replies	Views
How do I can I selectively inspect and use learned parameters in an MLJ pipeline? Machine Learning mlj , pipelines	1	334	June 20, 2022
AutoMLPipeline.jl makes it easy to create complexed ML pipeline structures Package Announcements machine-learning	23	2613	March 9, 2020
[ANN] MLJ: Outlier Detection, Text Analysis, Improved Pipelines and Serialization Package Announcements machine-learning , mlj , outlier-detection , pipelines , text-analysis	0	784	April 11, 2022
How do I tune a pipeline in MLJ? Machine Learning optimization , mlj	1	177	March 4, 2024
Right way of applying `inverse_transform` Machine Learning mlj , standardized	3	1225	June 25, 2022

Is it possible to apply some pipeline whether it'd be MLJ, Scikitlearn or FeatureTransform by group?

Related topics