I’ve got some sports data in a dataframe I’m playing with that has data on teams throughout a season and I have a number of performance measures I’d like to create cumulative sums for. I found some great suggestions for how to do this here, but I’m struggling with how to conveniently iterate through the columns I want to create because 1) the colon in the name indexing of columns makes things tricky and 2) I’m not sure how to iterate over a list of column names and use those column names to dynamically create new column names for the cumulative sums. Lastly - I would love to do these cumulative sums so that they were lagged by one time period (i.e. the value for the current row is not included in the running total). Here’s what I’ve got so far:
#To Pull the Data using HTTP, CSV, DataFrames url = "https://projects.fivethirtyeight.com/nfl-api/nfl_elo.csv" f38 = CSV.read(HTTP.get(url).body)
For creating just one column of cumulative sums, this works great:
f38[:, :cumul_elo1_pre] .= 0.0 for subdf in groupby(f38,[:team1, :season]) subdf[:, :cumul_elo1_pre] .= cumsum(subdf.elo1_pre) end
But if I’d like to do the same thing for all these columns, how can I do it without writing everything out, but iterating?
cumul_vars = ["elo1_pre", "elo2_pre", "qbelo1_pre", "qbelo2_pre", "qb1_value_pre", "qb2_value_pre", "qb1_adj", "qb2_adj", "score1", "score2"];