Using previous row values to create values for a new column

I have this DataFrame:

df = DataFrame(a = rand(Float64, 10))

I would like to add a second Integer column, b, whose value is 1 for the first row and, for the other rows, takes the result of a function that requires inputs from the previous row, as well as the current row.

Pseudo code:
df.b = if first row then 1 else (if df.a < 0.5 then previous row's df.b else previous row's df.b + 1)

looks like you have order dependency thus the task is not columnar. you can do it in a loop:

julia> b = similar(df.a, Int); b[1] = 1;

julia> for i in 2:10
           b[i] = b[i-1] + (df.a[i] >= 0.5)
       end

julia> df.b = b
4 Likes

Thank you.
I was also thinking about cloning the a column and shift it 1 row, but that would be inefficient and after all avoiding the loop wouldn’t make my code more elegant anyway

While direct method is no doubt efficient, if you are coming from an SQL world, you probably want to use something like ShiftedArrays.jl which has lag and lead functions: ANN: ShiftedArrays and support for ShiftedArrays in GroupedErrors

2 Likes

If I create a ShiftedArray for a DataFrame column, would the shifted array be a copy of the data frame column?
I guess this question is equivalent to asking if data frame columns are backed by arrays as their storage representation