Create lead and lag variable in DataFrame

I agree on the conceptual distinction between working with Julia (arrays are naturally fast, a DataFrame is a collection of arrays of the same length) vs Pandas (to do something efficiently you often need a specialized function). However, in this case there is a different concern: it is annoying for newcomers to do this without off by one mistakes. Here for example, you get a column that is too short for a DataFrame, you should instead do:

[i==1 ? missing : v[i] - v[i-1] for i ∈ 1:length(v)]

or in the case of the question asked:

[i <= 2 ? missing : v[i] - 2*v[i-1] + v[i] for i ∈ 1:length(v)]

ShiftedArrays simply automatizes this procedure by automatically giving missing when you’re out of bounds (reasonably easy to implement in Julia thanks to the wonderful AbstractArray interface).

2 Likes