Hi everyone,
I have a dataset and I want to add a new column for every lag, with the column name being variable_(lag number). So for example,
using DataFrames
using ShiftedArrays
test = DataFrame(a=1:10, b=11:20)
I want to add 5 columns with column a_1 being just the original column A lagged once, column a_2 being lagged twice, etc.
Doing this for just a few lags is fine with something like
test.a2 = lag(test.a,2)
However, I donβt know how to do this in a loop/scale this up, which I need since I need to create about 48 new variables.
Thanks for your time!
You could try this:
vars = "a" .* string.(1:48)
hcat(test, DataFrame( vars .=> lag.(Ref(test.a), 1:48)))
See the following:
julia> using DataFrames, ShiftedArrays;
julia> df = DataFrame(a = 1:10, b = 11:20, c = 21:30);
julia> cols_to_add_lags = ["a", "b", "c"];
julia> for c in cols_to_add_lags
for i in 1:5
df[:, "$(c)_$(i)"] = lag(df[:, c], i)
end
end;
julia> df
10Γ18 DataFrame
Row β a b c a_1 a_2 a_3 a_4 a_5 β―
β Int64 Int64 Int64 Int64? Int64? Int64? Int64? Int64? β―
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 11 21 missing missing missing missing missing β―
2 β 2 12 22 1 missing missing missing missing
3 β 3 13 23 2 1 missing missing missing
4 β 4 14 24 3 2 1 missing missing
5 β 5 15 25 4 3 2 1 missing β―
6 β 6 16 26 5 4 3 2 1
7 β 7 17 27 6 5 4 3 2
8 β 8 18 28 7 6 5 4 3
9 β 9 19 29 8 7 6 5 4 β―
10 β 10 20 30 9 8 7 6 5
10 columns omitted
1 Like
or
julia> transform!(test, [:a => (x -> lag(x, i)) => "a_$i" for i in 1:5])
10Γ7 DataFrame
Row β a b a_1 a_2 a_3 a_4 a_5
β Int64 Int64 Int64? Int64? Int64? Int64? Int64?
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 11 missing missing missing missing missing
2 β 2 12 1 missing missing missing missing
3 β 3 13 2 1 missing missing missing
4 β 4 14 3 2 1 missing missing
5 β 5 15 4 3 2 1 missing
6 β 6 16 5 4 3 2 1
7 β 7 17 6 5 4 3 2
8 β 8 18 7 6 5 4 3
9 β 9 19 8 7 6 5 4
10 β 10 20 9 8 7 6 5
3 Likes
Worked like a charm, thanks so much everyone!
julia> transform(df, :a=>(x->hcat(lag.([x], 1:5)...))=>string.("a_",1:5))
10Γ7 DataFrame
Row β a b a_1 a_2 a_3 a_4 a_5
β Int64 Int64 Int64? Int64? Int64? Int64? Int64?
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 11 missing missing missing missing missing
2 β 2 12 1 missing missing missing missing
3 β 3 13 2 1 missing missing missing
4 β 4 14 3 2 1 missing missing
5 β 5 15 4 3 2 1 missing
6 β 6 16 5 4 3 2 1
7 β 7 17 6 5 4 3 2
8 β 8 18 7 6 5 4 3
9 β 9 19 8 7 6 5 4
10 β 10 20 9 8 7 6 5
julia> transform(df, :a=>(x->(;zip(Symbol.("a_",1:5),lag.([x], 1:5))...))=>AsTable)
10Γ7 DataFrame
Row β a b a_1 a_2 a_3 a_4 a_5
β Int64 Int64 Int64? Int64? Int64? Int64? Int64?
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 11 missing missing missing missing missing
2 β 2 12 1 missing missing missing missing
3 β 3 13 2 1 missing missing missing
4 β 4 14 3 2 1 missing missing
5 β 5 15 4 3 2 1 missing
6 β 6 16 5 4 3 2 1
7 β 7 17 6 5 4 3 2
8 β 8 18 7 6 5 4 3
9 β 9 19 8 7 6 5 4
10 β 10 20 9 8 7 6 5
julia> transform(df, :a=>(x->NamedTuple(zip(Symbol.("a_",1:5),lag.([x], 1:5))))=>AsTable)
10Γ7 DataFrame
Row β a b a_1 a_2 a_3 a_4 a_5
β Int64 Int64 Int64? Int64? Int64? Int64? Int64?
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 11 missing missing missing missing missing
2 β 2 12 1 missing missing missing missing
3 β 3 13 2 1 missing missing missing
4 β 4 14 3 2 1 missing missing
5 β 5 15 4 3 2 1 missing
6 β 6 16 5 4 3 2 1
7 β 7 17 6 5 4 3 2
8 β 8 18 7 6 5 4 3
9 β 9 19 8 7 6 5 4
10 β 10 20 9 8 7 6 5