Using previous row values to create values for a new column

alessio · March 18, 2021, 3:25pm

I have this DataFrame:

df = DataFrame(a = rand(Float64, 10))

I would like to add a second Integer column, b, whose value is 1 for the first row and, for the other rows, takes the result of a function that requires inputs from the previous row, as well as the current row.

Pseudo code:
df.b = if first row then 1 else (if df.a < 0.5 then previous row's df.b else previous row's df.b + 1)

jling · March 18, 2021, 3:33pm

looks like you have order dependency thus the task is not columnar. you can do it in a loop:

julia> b = similar(df.a, Int); b[1] = 1;

julia> for i in 2:10
           b[i] = b[i-1] + (df.a[i] >= 0.5)
       end

julia> df.b = b

alessio · March 18, 2021, 4:02pm

Thank you.
I was also thinking about cloning the a column and shift it 1 row, but that would be inefficient and after all avoiding the loop wouldn’t make my code more elegant anyway

Skoffer · March 18, 2021, 4:31pm

While direct method is no doubt efficient, if you are coming from an SQL world, you probably want to use something like ShiftedArrays.jl which has lag and lead functions: ANN: ShiftedArrays and support for ShiftedArrays in GroupedErrors

alessio · March 20, 2021, 3:18pm

If I create a ShiftedArray for a DataFrame column, would the shifted array be a copy of the data frame column?
I guess this question is equivalent to asking if data frame columns are backed by arrays as their storage representation

Topic		Replies	Views
Creating one column from another New to Julia	5	308	September 8, 2022
Implementing a ceil function in a complete dataFrame New to Julia	6	350	October 29, 2020
Add column and column names of variable lags to dataframe New to Julia dataframes	5	503	September 1, 2022
Need for speed: looping over subdataframes to construct lags Performance question , dataframes	6	376	March 18, 2023
DataFrame vs. Pandas (vs. Excel...), e.g. to refer to previous row General Usage	6	2395	April 23, 2020

Using previous row values to create values for a new column

Related topics