Interpolation between two dataframe columns

Hi,

I think my code is highly sub-optimised, is there a more monolithic way (and preferably faster) to interpolate, e.g. without using eachrow, push etc.

Code:

using DataFrames
using Interpolations

# Dates, then values
data = [2000 2001 2002 2003; 1 20 30 4; 2 30 40 5; 3 40 50 6; 4 50 60 7]
df = DataFrame(Tables.table(data))

# Collect the dates
A_x = collect(df[1,:])
nodes = (A_x,)

# Find the column interpolation for this date (decimal year)
date = 2001.5

# Where to store the result
interpolated_column = Float64[]

# Interpolation for each row containing values
for row in eachrow(df[2:end,:])
    A = collect(row)
    itp = interpolate(nodes, A, Gridded(Linear()))
    res = itp(date)
    push!(interpolated_column, res)
end

# Result
interpolated_column

Execution:

julia> using DataFrames
julia> using Interpolations

julia> data = [2000 2001 2002 2003; 1 20 30 4; 2 30 40 5; 3 40 50 6; 4 50 60 7]
5×4 Matrix{Int64}:
 2000  2001  2002  2003
    1    20    30     4
    2    30    40     5
    3    40    50     6
    4    50    60     7

julia> df = DataFrame(Tables.table(data))
5×4 DataFrame
 Row │ Column1  Column2  Column3  Column4 
     │ Int64    Int64    Int64    Int64   
─────┼────────────────────────────────────
   1 │    2000     2001     2002     2003
   2 │       1       20       30        4
   3 │       2       30       40        5
   4 │       3       40       50        6
   5 │       4       50       60        7

julia> A_x = collect(df[1,:])
4-element Vector{Int64}:
 2000
 2001
 2002
 2003

julia> nodes = (A_x,)
([2000, 2001, 2002, 2003],)

julia> date = 2001.5
2001.5

julia> interpolated_column = Float64[]
Float64[]

julia> for row in eachrow(df[2:end,:])
           A = collect(row)
           itp = interpolate(nodes, A, Gridded(Linear()))
           res = itp(date)
           push!(interpolated_column, res)
       end

julia> interpolated_column
4-element Vector{Float64}:
 25.0
 35.0
 45.0
 55.0

Any suggestion welcome.

My recommendation would be to

  1. flip your X and Y axes. DataFrame columns are parameters and rows are observations. So
df = DataFrame(:date => [2000, 2001, 2002, 2003], 
:A => [1,2,3,4],
:B => [2,3,4,5], 
...
  1. follow this thread: Interpolate DataFrame columns - #2 by nilshg
1 Like

Thanks for your answer.
I changed a bit my first post example to avoid some ambiguity.

Now I try:

df = DataFrame(:date => [2000, 2001, 2002, 2003],
:A => [1,20,30,4],
:B => [2,30,40,5],
:C => [3,40,50,6],
:D => [4,50,60,7])

# Collect the dates
A_x = collect(df[:,1])
nodes = (A_x,)

# Isolate data
A = df[:,2:end]

intlin = LinearInterpolation(nodes, A, Gridded(Linear()))

date = 2001.5
y = intlin(date)

But it fails with a no method matching error:

Closest candidates are:
  linear_interpolation(::NTuple{N, AbstractVector}, ::AbstractArray{T, N}; extrapolation_bc) where {N, T}

I tried to collect the data, or to convert to an array (anyway all I get is a matrix) but I still got an error.

Try this:

# Get data columns and dates
data_cols = names(df)[2:end]
dates = df.date
num_cols = length(data_cols)

# Create interpolators directly using  column references
for (i, col) in enumerate(data_cols)
    intlin[i] = interpolate((dates,), df[!, col], Gridded(Linear()))
end

date = 2001.5

# Use map! to fill the output array in-place
y = map!(itp -> itp(date), y, intlin)
1 Like

Thanks.
Transposing is indeed way faster (14 minutes vs 01:30 hours with my data).