using DataFrames data = DataFrame(time = rand(1:100, 1_000_000), val=rand(1_000_000))
consider the above data frame and say I wish to compute the average
val within a rolling window and normalize each
val using the rolling average and standard deviation. It’s almost like a moving average but notice that each time period has multiple values.
E.g say I am looking at a row with value
(time = 7, val= 0.551) and I wish to normal
val using a 12 time period average and stdev, I would need to compute the mean and standard deviation for all rows with
time in 2:13 (cos it’s a 12 month moving average taking 5 values from the past and 6 values from the future), then I would standardize it for all.
The real problem is slightly more complicated and I would need to do the same for more columns, but the basic is the same.
The only way I can think of now is to do a groupby for each month, so that would involve compute it almost 100 times.
Is there a library that has this implemented already?
using DataFrames, Chain, DataFrameMacros, Statistics data = DataFrame(time = rand(1:100, 1_000_000), val=rand(1_000_000)) function summarise(data, t) tmp = @chain data begin @subset t-5 <= :time <= t + 6 @combine(:meanval = mean(:val), :stdval= std(:val)) end tmp.meanval, tmp.stdval end df_summ = DataFrame(time = 1:100, mean_std = [summarise(data, t) for t in 1:100]) data_fnl = @chain data begin leftjoin(df_summ, on = [:time]) @transform :val_normlaised = (:val-:mean_std) / :mean_std end