using DataFrames
data = DataFrame(time = rand(1:100, 1_000_000), val=rand(1_000_000))
consider the above data frame and say I wish to compute the average val
within a rolling window and normalize each val
using the rolling average and standard deviation. It’s almost like a moving average but notice that each time period has multiple values.
E.g say I am looking at a row with value (time = 7, val= 0.551)
and I wish to normal val
using a 12 time period average and stdev, I would need to compute the mean and standard deviation for all rows with time
in 2:13 (cos it’s a 12 month moving average taking 5 values from the past and 6 values from the future), then I would standardize it for all.
The real problem is slightly more complicated and I would need to do the same for more columns, but the basic is the same.
The only way I can think of now is to do a groupby for each month, so that would involve compute it almost 100 times.
Is there a library that has this implemented already?
using DataFrames, Chain, DataFrameMacros, Statistics
data = DataFrame(time = rand(1:100, 1_000_000), val=rand(1_000_000))
function summarise(data, t)
tmp = @chain data begin
@subset t-5 <= :time <= t + 6
@combine(:meanval = mean(:val), :stdval= std(:val))
end
tmp.meanval[1], tmp.stdval[1]
end
df_summ = DataFrame(time = 1:100, mean_std = [summarise(data, t) for t in 1:100])
data_fnl = @chain data begin
leftjoin(df_summ, on = [:time])
@transform :val_normlaised = (:val-:mean_std[1]) / :mean_std[2]
end