How do I go about applying a running function to a dataframe column with variable window length?
Consider for example:
using DataFrames, RollingFunctions
df = DataFrame(:a => rand(100), :b => repeat([ x for x in 1:10 if isodd(x) ], 20))
Now say I want to use df.b to determine the windowspan argument to running, but I am in a long @chain… Here is what my intuition is, but this throws a MethodError because I’m feeding a vector to running for the windowspan
@chain df begin
# a lot of code
# I know there is `runstd` but my real use has a bit more to it
transform([:a, :b] => (a, b) -> running(x -> std(x), a, b))
I am not fully clear what you want. Can you write what you want without using transform, but just assuming a and b are just variables? Then I can help you translate this to operation specification syntax.
OK - so you want to apply different window spans for different elements of a and the spans should come from b? Then I think RollingFunctions.jl does not have it implemented currently. @JeffreySarnoff probably can confirm this.
Currently probably the simplest thing is to either:
write a custom code (this will be most efficient)
compute several vectors with different fixed rolling windows and then for each element of the output pick the value from the correct vector (this would work if you have only a few values of window size - but it is a workaround)
Actually, digging into RollingFunctions.jl you can actually get the desired effect. Try the following:
running((d1,d2)->( w = min(length(d1),Int64(d2[end])) ;
f(@view d1[end-w+1:end])) ,
a, float.(b), maximum(b))
Now ragged_run(std, a, b) should work, and it actually is pretty efficient (not as a bespoke function, which should be about the same amount of code).
This is possible because of some extra features quietly lurking in RollingFunctions.jl I discovered while checking the package just now.
In R rollapply(data = a, width = b, FUN = sd) accepts b to be a vector, in which case it performs rolling operation with variable window width specified by b individually for each observation. In RollingFunctions.jl b is currently fixed for all observations. Thank you!