# Apply interpolation functions on columns of a dataframe

Hello,

I’m a beginner in Julia…
i have:

1. a dataframe with 4 columns and 10 rows
2. three functions u1, u2, u3
3. a vector of 3 weights

And i want to create a function (arguments : the dataframe, the 3 functions and the weights ) this function must return a vector built as follows

• we apply the function u1 on values of column 2, the function u2 on values of column 3, the function u3 on values of column 4
• then we do, for each row, a weighted sum with vector w
• so, we have at the end a vector with 10 values

I’ve tried (with help of the community) by using transform but it doesn’t work with functions obtained by linear interpolations.

``````using Interpolations
x1=sort(vf[1][:,2])
y1=reverse(vf[1][:,1])
f1=LinearInterpolation(x1,y1)
``````

where vf[1] is 3x2 matrix

Have you an idea ?

What does the error say?

Can you please provide a minimum working example?

`LinearInterpolation` does not produce a `Function` but a functor. to turn `f1` into a function use an anonymous function wrapper `x -> f1(x)` or composition `identity∘f1`.

Two things could be done:

1. allow functors in DataFrames.jl (I am hesitant as it will make even harder for users to reason about the transformation minilanguage)
2. As maintainers of `LinearInterpolation` to make it a `Function` (I do not know the details why it is not a function).
1 Like

This is interesting. I will see if there is something I can do in DataFramesMeta that helps this.

In `@transform` you would do something along the lines of

``````@transform df : y = identity(f1(...))
``````

It doesn’t work with `identity`

``````using DataFrames, DataFramesMeta, Interpolations

df = DataFrame(u1 = rand(10), u2 = rand(10), u3 = rand(10))

vf = [[0 25000; 0.5 10000; 1 8000],
[0 32; 0.5 29; 1 26],
[0 45; 0.5 37; 1 30],
[0 0; 0.5 2; 1 4],
[0 0; 0.5 3; 1 4]];

x1=sort(vf[1][:,2]);
y1=reverse(vf[1][:,1]);
f1=LinearInterpolation(x1,y1);

x2=sort(vf[2][:,2]);
y2=reverse(vf[2][:,1]);
f2=LinearInterpolation(x2,y2);

x3=sort(vf[3][:,2]);
y3=reverse(vf[3][:,1]);
f3=LinearInterpolation(x3,y3);

w = [.5, .2, .3];

@transform df :z = w[1] * identity(f1(:u1)) + w[2] * identity(f2(:u2)) + w[3] * identity(f3(:u3))
``````

Here is the error

BoundsError: attempt to access 3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Float64}, Gridded(Linear())), Throw()) with element type Float64 at index [0.9017848022215782]

Sorry, the `Functor` issue was not the problem. (We couldn’t tell because you did not provide an MWE at first).

It seems there is something about `LinearInterpolation` which you don’t understand, and I don’t either. The error has nothing to do with DataFrames.

``````julia> f1(df.u1)
ERROR: BoundsError: attempt to access 3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Float64}, Gridded(Linear())), Throw()) with element type Float64 at index [0.2256458588860888]
``````

Maybe someone with better knowledge of linear interpolations can help.

You’re just evaluating the interpolant outside the grid provided:

``````julia> x1
3-element Vector{Float64}:
8000.0
10000.0
25000.0

julia> y1
3-element Vector{Float64}:
1.0
0.5
0.0

julia> f1=LinearInterpolation(x1,y1);

julia> f1(8_000)
1.0

julia> f1(10_000)
0.5

julia> f1(9_000)
0.75

julia> f1(1)
ERROR: BoundsError: attempt to access 3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Float64}, Gridded(Linear())), Throw()) with element type Float64 at index [1]
``````

if you want to extrapolate you need to be explicit about how:

``````julia> f2 = LinearInterpolation(x1, y1, extrapolation_bc = Line());

julia> f2(1)
2.99975
``````

(this is all covered in the first example of the docs here)

1 Like

You’re right !
Thanks !

Why doesn’it work inside a function ?

``````function values(data, w)
data_temp = deepcopy(data)
@transform data_temp :value=w[1]*f1(:u1)+w[2]*f2(:u2)+w[3]*f3(:u3)
return data_temp
end
``````

There is no error but nothing is returned.

You don’t need the `deepcopy`. `@transform` already makes a copy. you probably want

``````function values(data, w)
data_temp = @transform data :value=w[1]*f1(:u1)+w[2]*f2(:u2)+w[3]*f3(:u3)
return data_temp
end
``````
1 Like

On SO I have explained the issue of functor vs function I have mentioned above with an MWE:

Objects such as `li` are called functors in Julia and sometimes their authors opt-out of making them a subtype of `Function`

Why do you say that authors opt-out of subtyping a function? As I understand, it’s exactly the opposite, ie opt-in: one has to explicitly write `struct F <: Function`, this subtyping is not automatic whenever `function (f::F)(args)` is defined.

Functors behave the same as equivalent functions in the vast majority of places in julia, so there are typically few reasons to subtype `Function`. There are exceptions of course, also see a recent discussion of subtyping and possible drawbacks: Consider subtyping `Function` · Issue #37 · JuliaObjects/Accessors.jl · GitHub.

You are right. I should have said “do not opt-in which is required by DataFrames.jl”

Your comments are valid, but the question was specifically in the DataFrames.jl context. In this context functions like `transform` use dispatch to determine their behavior.
In particular as explained in julia - LinearInterpolation not working with transform in DataFrames.jl - Stack Overflow only `Base.Callable` objects are considered to be transformation functions. We cannot change this rule as using dispatch is the only way to decide how an arbitrary object passed to function like `transform` should be handled.

An alternative would be to have a set of traits like e.g. potentially having `iscallable`, but there is no such thing currently.

Also related to:

the compiler de-optimizes a higher-order function that does not call the argument

This is exactly what we want in DataFrames.jl (but it is a secondary consideration - a primary one is that we need dispatch to decide the behavior). The reason why we want it is that despecialization reduces compilation latency and since DataFrames.jl is a package that is used interactively in a majority of cases people want despecialization (technically: we despecialize everything expect the functions that do heavy computations only which are specialized).

1 Like

Thanks !
I will know it now

I have a use case where I use Functors as pre-trained features transformations. In such context, defining those structs as sub-types of `Function` doesn’t seem a natural choice as a system.

Here’s a functor that applies learned normalization:

``````using DataFrames
using Statistics: mean, std

struct Normalizer
μ
σ
end

Normalizer(x::AbstractVector) = Normalizer(mean(x), std(x))

function (m::Normalizer)(x::Real)
return (x - m.μ) / m.σ
end

function (m::Normalizer)(x::AbstractVector)
return (x .- m.μ) ./ m.σ
end

df = DataFrame(:v1 => rand(5), :v2 => rand(5))
feat_names = names(df)
norms = map((feat) -> Normalizer(df[:, feat]), feat_names)
``````

As discussed earlier, the following doesn’t work:

``````transform(df, feat_names .=> norms .=> feat_names)
ERROR: LoadError: ArgumentError: Unrecognized column selector: "v1" => (Normalizer(0.5407170762469404, 0.1599492895436335) => "v1")
``````

However, somewhat surprisingly, using `ByRow` does work:

``````transform(df, feat_names .=> ByRow.(norms) .=> feat_names)
5×2 DataFrame
Row │ v1          v2
│ Float64     Float64
─────┼───────────────────────
1 │  0.0386826   0.479449
2 │  0.919179   -1.61432
3 │  1.05579     0.584841
4 │ -0.930937    0.854153
5 │ -1.08272    -0.304124
``````

So to use the vectorized form, it seems like a mapping of the Functors into Functions is required:

``````norms_f = map(f -> (x) -> f(x), norms)
transform(df, feat_names .=> norms_f .=> feat_names)
5×2 DataFrame
Row │ v1          v2
│ Float64     Float64
─────┼───────────────────────
1 │  0.0386826   0.479449
2 │  0.919179   -1.61432
3 │  1.05579     0.584841
4 │ -0.930937    0.854153
5 │ -1.08272    -0.304124
``````

I can see that there’s a not too complicated way to circumvent the functor limitation through that remapping. Yet, isn’t it counterintuitive to see the Functor works in the `ByRow` but not in the vectorized case? Having the opportunity to recognize Functors as Functions in the `transform` would be their most natural handling in my opinion.

This is a completely different dispatch path internally.

Can you open an issue for this an we can discuss what can be done about it.

1 Like