Apply interpolation functions on columns of a dataframe

corny85 · November 10, 2021, 1:28pm

Hello,

I’m a beginner in Julia…
i have:

a dataframe with 4 columns and 10 rows
three functions u1, u2, u3
a vector of 3 weights

And i want to create a function (arguments : the dataframe, the 3 functions and the weights ) this function must return a vector built as follows

we apply the function u1 on values of column 2, the function u2 on values of column 3, the function u3 on values of column 4
then we do, for each row, a weighted sum with vector w
so, we have at the end a vector with 10 values

I’ve tried (with help of the community) by using transform but it doesn’t work with functions obtained by linear interpolations.

using Interpolations
x1=sort(vf[1][:,2])
y1=reverse(vf[1][:,1])
f1=LinearInterpolation(x1,y1)

where vf[1] is 3x2 matrix

Have you an idea ?
Thanks for for your help.

pdeffebach · November 10, 2021, 2:07pm

What does the error say?

Can you please provide a minimum working example?

bkamins · November 10, 2021, 2:20pm

LinearInterpolation does not produce a Function but a functor. to turn f1 into a function use an anonymous function wrapper x -> f1(x) or composition identity∘f1.

Two things could be done:

allow functors in DataFrames.jl (I am hesitant as it will make even harder for users to reason about the transformation minilanguage)
As maintainers of LinearInterpolation to make it a Function (I do not know the details why it is not a function).

pdeffebach · November 10, 2021, 2:34pm

This is interesting. I will see if there is something I can do in DataFramesMeta that helps this.

In @transform you would do something along the lines of

@transform df : y = identity(f1(...))

corny85 · November 10, 2021, 3:02pm

It doesn’t work with identity

using DataFrames, DataFramesMeta, Interpolations

df = DataFrame(u1 = rand(10), u2 = rand(10), u3 = rand(10))

vf = [[0 25000; 0.5 10000; 1 8000],
    [0 32; 0.5 29; 1 26],
    [0 45; 0.5 37; 1 30],
    [0 0; 0.5 2; 1 4],
    [0 0; 0.5 3; 1 4]];

x1=sort(vf[1][:,2]);
y1=reverse(vf[1][:,1]);
f1=LinearInterpolation(x1,y1);

x2=sort(vf[2][:,2]);
y2=reverse(vf[2][:,1]);
f2=LinearInterpolation(x2,y2);

x3=sort(vf[3][:,2]);
y3=reverse(vf[3][:,1]);
f3=LinearInterpolation(x3,y3);

w = [.5, .2, .3];

@transform df :z = w[1] * identity(f1(:u1)) + w[2] * identity(f2(:u2)) + w[3] * identity(f3(:u3))

Here is the error

BoundsError: attempt to access 3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Float64}, Gridded(Linear())), Throw()) with element type Float64 at index [0.9017848022215782]

Thanks in advance.

pdeffebach · November 10, 2021, 3:11pm

Sorry, the Functor issue was not the problem. (We couldn’t tell because you did not provide an MWE at first).

It seems there is something about LinearInterpolation which you don’t understand, and I don’t either. The error has nothing to do with DataFrames.

julia> f1(df.u1)
ERROR: BoundsError: attempt to access 3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Float64}, Gridded(Linear())), Throw()) with element type Float64 at index [0.2256458588860888]

Maybe someone with better knowledge of linear interpolations can help.

nilshg · November 10, 2021, 3:30pm

You’re just evaluating the interpolant outside the grid provided:

julia> x1
3-element Vector{Float64}:
  8000.0
 10000.0
 25000.0

julia> y1
3-element Vector{Float64}:
 1.0
 0.5
 0.0

julia> f1=LinearInterpolation(x1,y1);

julia> f1(8_000)
1.0

julia> f1(10_000)
0.5

julia> f1(9_000)
0.75

julia> f1(1)
ERROR: BoundsError: attempt to access 3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Float64}, Gridded(Linear())), Throw()) with element type Float64 at index [1]

if you want to extrapolate you need to be explicit about how:

julia> f2 = LinearInterpolation(x1, y1, extrapolation_bc = Line());

julia> f2(1)
2.99975

(this is all covered in the first example of the docs here)

corny85 · November 10, 2021, 3:43pm

You’re right !
Thanks !

corny85 · November 10, 2021, 4:35pm

Why doesn’it work inside a function ?

function values(data, w)
    data_temp = deepcopy(data)
    @transform data_temp :value=w[1]*f1(:u1)+w[2]*f2(:u2)+w[3]*f3(:u3)
    return data_temp
end

There is no error but nothing is returned.
Thanks for your help.

pdeffebach · November 10, 2021, 5:40pm

You don’t need the deepcopy. @transform already makes a copy. you probably want

function values(data, w)
    data_temp = @transform data :value=w[1]*f1(:u1)+w[2]*f2(:u2)+w[3]*f3(:u3)
    return data_temp
end

corny85 · November 10, 2021, 6:01pm

Thanks a lot for your patience and your help !

bkamins · November 11, 2021, 9:28am

On SO I have explained the issue of functor vs function I have mentioned above with an MWE:
https://stackoverflow.com/questions/69925933/linearinterpolation-not-working-with-transform-in-dataframes-jl

aplavin · November 11, 2021, 11:19am

Objects such as li are called functors in Julia and sometimes their authors opt-out of making them a subtype of Function

Why do you say that authors opt-out of subtyping a function? As I understand, it’s exactly the opposite, ie opt-in: one has to explicitly write struct F <: Function, this subtyping is not automatic whenever function (f::F)(args) is defined.

Functors behave the same as equivalent functions in the vast majority of places in julia, so there are typically few reasons to subtype Function. There are exceptions of course, also see a recent discussion of subtyping and possible drawbacks: Consider subtyping `Function` · Issue #37 · JuliaObjects/Accessors.jl · GitHub.

bkamins · November 11, 2021, 12:11pm

You are right. I should have said “do not opt-in which is required by DataFrames.jl”

Your comments are valid, but the question was specifically in the DataFrames.jl context. In this context functions like transform use dispatch to determine their behavior.
In particular as explained in julia - LinearInterpolation not working with transform in DataFrames.jl - Stack Overflow only Base.Callable objects are considered to be transformation functions. We cannot change this rule as using dispatch is the only way to decide how an arbitrary object passed to function like transform should be handled.

An alternative would be to have a set of traits like e.g. potentially having iscallable, but there is no such thing currently.

Also related to:

the compiler de-optimizes a higher-order function that does not call the argument

This is exactly what we want in DataFrames.jl (but it is a secondary consideration - a primary one is that we need dispatch to decide the behavior). The reason why we want it is that despecialization reduces compilation latency and since DataFrames.jl is a package that is used interactively in a majority of cases people want despecialization (technically: we despecialize everything expect the functions that do heavy computations only which are specialized).

corny85 · November 11, 2021, 12:20pm

Thanks !
I will know it now

jeremiedb · January 7, 2022, 4:39pm

I have a use case where I use Functors as pre-trained features transformations. In such context, defining those structs as sub-types of Function doesn’t seem a natural choice as a system.

Here’s a functor that applies learned normalization:

using DataFrames
using Statistics: mean, std

struct Normalizer
    μ
    σ
end

Normalizer(x::AbstractVector) = Normalizer(mean(x), std(x))

function (m::Normalizer)(x::Real)
    return (x - m.μ) / m.σ
end

function (m::Normalizer)(x::AbstractVector)
    return (x .- m.μ) ./ m.σ
end

df = DataFrame(:v1 => rand(5), :v2 => rand(5))
feat_names = names(df)
norms = map((feat) -> Normalizer(df[:, feat]), feat_names)

As discussed earlier, the following doesn’t work:

transform(df, feat_names .=> norms .=> feat_names)
ERROR: LoadError: ArgumentError: Unrecognized column selector: "v1" => (Normalizer(0.5407170762469404, 0.1599492895436335) => "v1")

However, somewhat surprisingly, using ByRow does work:

transform(df, feat_names .=> ByRow.(norms) .=> feat_names)
5×2 DataFrame
 Row │ v1          v2        
     │ Float64     Float64
─────┼───────────────────────
   1 │  0.0386826   0.479449
   2 │  0.919179   -1.61432
   3 │  1.05579     0.584841
   4 │ -0.930937    0.854153
   5 │ -1.08272    -0.304124

So to use the vectorized form, it seems like a mapping of the Functors into Functions is required:

norms_f = map(f -> (x) -> f(x), norms)
transform(df, feat_names .=> norms_f .=> feat_names)
5×2 DataFrame
 Row │ v1          v2        
     │ Float64     Float64
─────┼───────────────────────
   1 │  0.0386826   0.479449
   2 │  0.919179   -1.61432
   3 │  1.05579     0.584841
   4 │ -0.930937    0.854153
   5 │ -1.08272    -0.304124

I can see that there’s a not too complicated way to circumvent the functor limitation through that remapping. Yet, isn’t it counterintuitive to see the Functor works in the ByRow but not in the vectorized case? Having the opportunity to recognize Functors as Functions in the transform would be their most natural handling in my opinion.

bkamins · January 7, 2022, 4:59pm

This is a completely different dispatch path internally.

Can you open an issue for this an we can discuss what can be done about it.

jeremiedb · January 7, 2022, 5:10pm

Issue opened: https://github.com/JuliaData/DataFrames.jl/issues/2984

Topic		Replies	Views
Apply some functions to columns of a dataframe General Usage question , dataframes	5	1484	November 10, 2021
With DataFrames, best practice for applying function across columns, where we also need to reference, in a second argument, the same column for each function call? General Usage dataframes	11	257	April 9, 2025
Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language New to Julia question , dataframes	22	702	December 23, 2022
Frustrated using DataFrames New to Julia dataframes , data_structures	97	10550	April 22, 2022
Apply a column of anonymous functions for each column in a column subset Data dataframes	11	852	April 14, 2022

Apply interpolation functions on columns of a dataframe

Related topics