How to compute a "cumulative" in a dataframe (without a for loop)

I’m not sure I understand what you’re trying to do, but if it’s “in every row, put the sum of the value column for the two rows above, except when there aren’t two above, in which case put missing”, you can combine RollingFunctions with ShiftedArrays like this:

julia> using DataFrames, ShiftedArrays, RollingFunctions

julia> df = DataFrame(region = ["US","US","US","US","EU","EU","EU","EU"],
                      year   = [2010,2011,2012,2013,2010,2011,2012,2013],
                                     value  = [3,3,2,2,2,2,1,1]);

julia> transform(groupby(df, :region),
           :value => (x -> running(sum, lag(x, 1), 2)) => :cumval)
8×4 DataFrame
 Row │ region  year   value  cumval    
     │ String  Int64  Int64  Float64?  
─────┼─────────────────────────────────
   1 │ US       2010      3  missing   
   2 │ US       2011      3  missing   
   3 │ US       2012      2        6.0
   4 │ US       2013      2        5.0
   5 │ EU       2010      2  missing   
   6 │ EU       2011      2  missing   
   7 │ EU       2012      1        4.0
   8 │ EU       2013      1        3.0

So EU in 2013 is 3 (sum of 2012 and 2011), in 2012 is 4 (sum of 2010 and 2011) while in 2011 it’s missing as there’s no 2009 observation.

2 Likes