I have a use case where I use Functors as pre-trained features transformations. In such context, defining those structs as sub-types of Function
doesn’t seem a natural choice as a system.
Here’s a functor that applies learned normalization:
using DataFrames
using Statistics: mean, std
struct Normalizer
μ
σ
end
Normalizer(x::AbstractVector) = Normalizer(mean(x), std(x))
function (m::Normalizer)(x::Real)
return (x - m.μ) / m.σ
end
function (m::Normalizer)(x::AbstractVector)
return (x .- m.μ) ./ m.σ
end
df = DataFrame(:v1 => rand(5), :v2 => rand(5))
feat_names = names(df)
norms = map((feat) -> Normalizer(df[:, feat]), feat_names)
As discussed earlier, the following doesn’t work:
transform(df, feat_names .=> norms .=> feat_names)
ERROR: LoadError: ArgumentError: Unrecognized column selector: "v1" => (Normalizer(0.5407170762469404, 0.1599492895436335) => "v1")
However, somewhat surprisingly, using ByRow
does work:
transform(df, feat_names .=> ByRow.(norms) .=> feat_names)
5×2 DataFrame
Row │ v1 v2
│ Float64 Float64
─────┼───────────────────────
1 │ 0.0386826 0.479449
2 │ 0.919179 -1.61432
3 │ 1.05579 0.584841
4 │ -0.930937 0.854153
5 │ -1.08272 -0.304124
So to use the vectorized form, it seems like a mapping of the Functors into Functions is required:
norms_f = map(f -> (x) -> f(x), norms)
transform(df, feat_names .=> norms_f .=> feat_names)
5×2 DataFrame
Row │ v1 v2
│ Float64 Float64
─────┼───────────────────────
1 │ 0.0386826 0.479449
2 │ 0.919179 -1.61432
3 │ 1.05579 0.584841
4 │ -0.930937 0.854153
5 │ -1.08272 -0.304124
I can see that there’s a not too complicated way to circumvent the functor limitation through that remapping. Yet, isn’t it counterintuitive to see the Functor works in the ByRow
but not in the vectorized case? Having the opportunity to recognize Functors as Functions in the transform
would be their most natural handling in my opinion.