RollingFunctions with a variable window width

How do I go about applying a running function to a dataframe column with variable window length?
Consider for example:

using DataFrames, RollingFunctions

df = DataFrame(:a => rand(100), :b => repeat([ x for x in 1:10 if isodd(x) ], 20))

Now say I want to use df.b to determine the windowspan argument to running, but I am in a long @chain… Here is what my intuition is, but this throws a MethodError because I’m feeding a vector to running for the windowspan

@chain df begin
    # a lot of code
    # I know there is `runstd` but my real use has a bit more to it
    transform([:a, :b] => (a, b) -> running(x -> std(x), a, b))

use GitHub - JeffreySarnoff/RollingFunctions.jl: Roll a window over data; apply a function over the window.

Sorry I had prematurely hit enter on my post - my first post has been edited with the rest of the content

I am not fully clear what you want. Can you write what you want without using transform, but just assuming a and b are just variables? Then I can help you translate this to operation specification syntax.

sure thing:

a = rand(100)
b = repeat([x for x in 1:10 if isodd(x)], 20)

running(x -> std(x), a, b) # This doesn't work

For example, if b is 6, then I want running(x -> std(x), a, 6)

But my problem is that in your case b is a vector, do you mean you want:

running.(std, Ref(a), b)

i.e. apply std to a for each value in vector b?

But my problem is that b is a vector

This is the problem I am running into in the DataFrames context. For example, if we go back to this:

df = DataFrame(:a => rand(100), :b => repeat([ x for x in 1:10 if isodd(x) ], 20))

Then what I really want is to say “if b is 2, do running(x -> std(x), a, 2), if b is 4, then do running(x -> std(x), a, 4), etc”. Is it a problem that I’ve laid out my data like this in general?

To put it another way, if I were writing this in dplyr, what I would do is:

df %>%
        foo = rollapply(data = a, width = b, FUN = sd)

OK - so you want to apply different window spans for different elements of a and the spans should come from b? Then I think RollingFunctions.jl does not have it implemented currently. @JeffreySarnoff probably can confirm this.

Currently probably the simplest thing is to either:

  1. write a custom code (this will be most efficient)
  2. compute several vectors with different fixed rolling windows and then for each element of the output pick the value from the correct vector (this would work if you have only a few values of window size - but it is a workaround)
1 Like

Thanks for this. I’ll make a PR to RollingFunctions if I write anything of value.

Actually, digging into RollingFunctions.jl you can actually get the desired effect. Try the following:

Using RollingFunctions

ragged_run(f,a,b) = 
    running((d1,d2)->( w = min(length(d1),Int64(d2[end])) ;
                       f(@view d1[end-w+1:end])) , 
            a, float.(b), maximum(b))

Now ragged_run(std, a, b) should work, and it actually is pretty efficient (not as a bespoke function, which should be about the same amount of code).
This is possible because of some extra features quietly lurking in RollingFunctions.jl I discovered while checking the package just now.

Excellent. As noted - @JeffreySarnoff is probably the best person to discuss adding features/improving documentation of the package.

Present. What may I do?

In R rollapply(data = a, width = b, FUN = sd) accepts b to be a vector, in which case it performs rolling operation with variable window width specified by b individually for each observation. In RollingFunctions.jl b is currently fixed for all observations. Thank you!

Do you intend that
rollapply(data = [1,2,3,4,5,6,7,8], width = [2,3,3], fn=mean)
return [mean(1,2), mean(3,4,5), mean(6,7,8)] or something different?