Rolling/running functions with complex output on multiple variables

Dear Julia community,

I want to run a rolling function with multiple output on multiple columns of a DataFrame. RollingFunctions.jl seems to allow only scalar output. ImageFiltering.mapwindow only allows as input a single array only.

I came up with (for a simple slope and intercept calc of a bivariate regression):

using Chain, DataFrameMacros, ImageFiltering

## Function to calc slope, intercept of y ~ x
lmBivar(x,y) = @chain lm([fill(1, length(x)) x], y) coef() (;inter=_[1], slope=_[2])

## Adapt mapwindow function to take two inputs
mapwindow2(fun, x,y, win)= mapwindow(x-> fun(getindex.(x,1), getindex.(x,2)), zip(x,y)|>collect, win)

## Apply to toy data
x = randn(1000)
y = 2x .+ 10 .+ randn(1000)

df=DataFrame(x=x,y=y)

dfs =@chain df begin 
    @transform :lmres=@bycol mapwindow2(lmBivar, :x, :y, 49)
    transform(:lmres => AsTable)
    select(Not(:lmres))
 end

It does what I intended, but please educate me if this ok or if there are better ways (in terms of elegance, generality, performance). Learning…

@JeffreySarnoff is an expert on rolling functions, so maybe he can help :).

1 Like

This is a univariate regression. The parameters can be computed separately: slope is cov(x,y)/var(x), the intercept is mean(y) - b *mean(x) (b is the slope). Obviously you would need to use the rolled versions of these functions.

Also,

This is going to allocate a lot of memory. Better to do (if you really want to use GLM):

@chain begin
DataFrame(x=x,y=y;copycols=false)
lm(@formula(y ~ 1 + x),df)
coef
(inter=_[1], slope=_[2]) # semi colon not needed here
end

Sure - it was just a MWE… For instance I want to use quantreg from RobustModels.jl or anything else.

I know this, but this is for the entire Dataframe and the point is I would not know how to get it “rolling”

No, that is how you should define lmBivar(x,y)

Ah thanks! Got it and makes sense!

Thanks again - realized that df should of course be replaced by _ in line 3.

FWIW my function (meanwhile I had added the function name):

regBivar(x,y, fun) = @chain begin
    DataFrame(x=x,y=y;copycols=false)
    fun(@formula(y ~ 1 + x), _)
    coef
    NamedTuple(zip(string(fun) .* ["_inter", "_slope"] .|> Symbol, _ ))
end
1 Like