Add column and column names of variable lags to dataframe

Hi everyone,

I have a dataset and I want to add a new column for every lag, with the column name being variable_(lag number). So for example,

using DataFrames
using ShiftedArrays

test = DataFrame(a=1:10, b=11:20)

I want to add 5 columns with column a_1 being just the original column A lagged once, column a_2 being lagged twice, etc.

Doing this for just a few lags is fine with something like

test.a2 = lag(test.a,2)

However, I don’t know how to do this in a loop/scale this up, which I need since I need to create about 48 new variables.

Thanks for your time!

You could try this:

vars = "a" .* string.(1:48)
hcat(test, DataFrame( vars .=> lag.(Ref(test.a), 1:48)))

See the following:

julia> using DataFrames, ShiftedArrays;

julia> df = DataFrame(a = 1:10, b = 11:20, c = 21:30);

julia> cols_to_add_lags = ["a", "b", "c"];

julia> for c in cols_to_add_lags
           for i in 1:5
               df[:, "$(c)_$(i)"] = lag(df[:, c], i)
           end
       end;

julia> df
10Γ—18 DataFrame
 Row β”‚ a      b      c      a_1      a_2      a_3      a_4      a_5       β‹―
     β”‚ Int64  Int64  Int64  Int64?   Int64?   Int64?   Int64?   Int64?    β‹―
─────┼─────────────────────────────────────────────────────────────────────
   1 β”‚     1     11     21  missing  missing  missing  missing  missing   β‹―
   2 β”‚     2     12     22        1  missing  missing  missing  missing 
   3 β”‚     3     13     23        2        1  missing  missing  missing 
   4 β”‚     4     14     24        3        2        1  missing  missing 
   5 β”‚     5     15     25        4        3        2        1  missing   β‹―
   6 β”‚     6     16     26        5        4        3        2        1
   7 β”‚     7     17     27        6        5        4        3        2
   8 β”‚     8     18     28        7        6        5        4        3
   9 β”‚     9     19     29        8        7        6        5        4   β‹―
  10 β”‚    10     20     30        9        8        7        6        5
                                                         10 columns omitted
1 Like

or

julia> transform!(test, [:a => (x -> lag(x, i)) => "a_$i" for i in 1:5])
10Γ—7 DataFrame
 Row β”‚ a      b      a_1      a_2      a_3      a_4      a_5
     β”‚ Int64  Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────────────────────
   1 β”‚     1     11  missing  missing  missing  missing  missing
   2 β”‚     2     12        1  missing  missing  missing  missing
   3 β”‚     3     13        2        1  missing  missing  missing
   4 β”‚     4     14        3        2        1  missing  missing
   5 β”‚     5     15        4        3        2        1  missing
   6 β”‚     6     16        5        4        3        2        1
   7 β”‚     7     17        6        5        4        3        2
   8 β”‚     8     18        7        6        5        4        3
   9 β”‚     9     19        8        7        6        5        4
  10 β”‚    10     20        9        8        7        6        5
3 Likes

Worked like a charm, thanks so much everyone!

julia> transform(df, :a=>(x->hcat(lag.([x], 1:5)...))=>string.("a_",1:5))
10Γ—7 DataFrame
 Row β”‚ a      b      a_1      a_2      a_3      a_4      a_5     
     β”‚ Int64  Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────────────────────
   1 β”‚     1     11  missing  missing  missing  missing  missing
   2 β”‚     2     12        1  missing  missing  missing  missing
   3 β”‚     3     13        2        1  missing  missing  missing
   4 β”‚     4     14        3        2        1  missing  missing
   5 β”‚     5     15        4        3        2        1  missing
   6 β”‚     6     16        5        4        3        2        1
   7 β”‚     7     17        6        5        4        3        2
   8 β”‚     8     18        7        6        5        4        3
   9 β”‚     9     19        8        7        6        5        4
  10 β”‚    10     20        9        8        7        6        5
julia> transform(df, :a=>(x->(;zip(Symbol.("a_",1:5),lag.([x], 1:5))...))=>AsTable)
10Γ—7 DataFrame
 Row β”‚ a      b      a_1      a_2      a_3      a_4      a_5     
     β”‚ Int64  Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────────────────────
   1 β”‚     1     11  missing  missing  missing  missing  missing
   2 β”‚     2     12        1  missing  missing  missing  missing
   3 β”‚     3     13        2        1  missing  missing  missing
   4 β”‚     4     14        3        2        1  missing  missing
   5 β”‚     5     15        4        3        2        1  missing
   6 β”‚     6     16        5        4        3        2        1
   7 β”‚     7     17        6        5        4        3        2
   8 β”‚     8     18        7        6        5        4        3
   9 β”‚     9     19        8        7        6        5        4
  10 β”‚    10     20        9        8        7        6        5

julia> transform(df, :a=>(x->NamedTuple(zip(Symbol.("a_",1:5),lag.([x], 1:5))))=>AsTable)
10Γ—7 DataFrame
 Row β”‚ a      b      a_1      a_2      a_3      a_4      a_5     
     β”‚ Int64  Int64  Int64?   Int64?   Int64?   Int64?   Int64?
─────┼───────────────────────────────────────────────────────────
   1 β”‚     1     11  missing  missing  missing  missing  missing
   2 β”‚     2     12        1  missing  missing  missing  missing
   3 β”‚     3     13        2        1  missing  missing  missing
   4 β”‚     4     14        3        2        1  missing  missing
   5 β”‚     5     15        4        3        2        1  missing
   6 β”‚     6     16        5        4        3        2        1
   7 β”‚     7     17        6        5        4        3        2
   8 β”‚     8     18        7        6        5        4        3
   9 β”‚     9     19        8        7        6        5        4
  10 β”‚    10     20        9        8        7        6        5