The implementation I wrote is more general as it is meant to be used with panels and handle the nuance of steps (date-time, gaps, implied minimum step, etc.). I would recommend using diff if the case is simple enough.
Thanks for the reply. In my case (not the example I gave here), I need to calculate price [i + 1] - 2*price [i] + price[i - 1], which equals price [i+1] - price[i] - (price [i] - price[i-1]) which is double difference, which can use your function. However, I still want to know if a window function like lead and lag is available with dataframe in Julia, like what dplyr does in R, because that approach is more flexible.
v = lead(price) .- 2 .* price .+ lag(price)
which avoids unnecessary allocations. I think it works best on Julia 0.7 due to recent improvements on missing data handling by @nalimilan, but if you don’t think you’ll encounter performance issues it should be usable already.
To integrate this functionality in DataFrames, I guess the easiest is to add ShiftedArrays as a DataFrames dependency (ShiftedArrays is a small pure Julia package) and reexport lead and lag.
This probably isn’t the advice you’re looking for in this particular instance, but, for what it’s worth, I thought I should point out that a huge advantage that Julia has over Python and R in cases like this is that pure Julia code is actually efficient. So, if you were to do, for example, [v[i] - v[i-1] for i ∈ 2:length(v)] what you get will more or less run like C code. Furthermore, since DataFrames columns are just AbstractArrays, you almost never need to worry about them having some sort of unorthodox behavior. In Julia the need for specialized or fancy code for data manipulation is much mitigated compared to Python and R. (This also means that, as much as I love DataFramesMeta.jl, it often simply isn’t necessary.)
I agree on the conceptual distinction between working with Julia (arrays are naturally fast, a DataFrame is a collection of arrays of the same length) vs Pandas (to do something efficiently you often need a specialized function). However, in this case there is a different concern: it is annoying for newcomers to do this without off by one mistakes. Here for example, you get a column that is too short for a DataFrame, you should instead do:
[i==1 ? missing : v[i] - v[i-1] for i ∈ 1:length(v)]
or in the case of the question asked:
[i <= 2 ? missing : v[i] - 2*v[i-1] + v[i] for i ∈ 1:length(v)]
ShiftedArrays simply automatizes this procedure by automatically giving missing when you’re out of bounds (reasonably easy to implement in Julia thanks to the wonderful AbstractArray interface).
Like I said, this is not necessarily the very best use case, but when doing these sorts of things myself I have found it very useful to keep in mind that you don’t have to do everything with specialized code. It’s a very liberating feeling.
julia> Pkg.add("ShiftedArrays") # run only once to install the package
julia> using ShiftedArrays
julia> price = rand(10);
julia> v = price .- lag(price)
What error do you get?
To use it with DataFrames, make sure you have at least version 0.11 as it is the first one to support missing