I want to run a rolling function with multiple output on multiple columns of a DataFrame. RollingFunctions.jl seems to allow only scalar output. ImageFiltering.mapwindow only allows as input a single array only.
I came up with (for a simple slope and intercept calc of a bivariate regression):
using Chain, DataFrameMacros, ImageFiltering
## Function to calc slope, intercept of y ~ x
lmBivar(x,y) = @chain lm([fill(1, length(x)) x], y) coef() (;inter=_[1], slope=_[2])
## Adapt mapwindow function to take two inputs
mapwindow2(fun, x,y, win)= mapwindow(x-> fun(getindex.(x,1), getindex.(x,2)), zip(x,y)|>collect, win)
## Apply to toy data
x = randn(1000)
y = 2x .+ 10 .+ randn(1000)
df=DataFrame(x=x,y=y)
dfs =@chain df begin
@transform :lmres=@bycol mapwindow2(lmBivar, :x, :y, 49)
transform(:lmres => AsTable)
select(Not(:lmres))
end
It does what I intended, but please educate me if this ok or if there are better ways (in terms of elegance, generality, performance). Learning…
This is a univariate regression. The parameters can be computed separately: slope is cov(x,y)/var(x), the intercept is mean(y) - b *mean(x) (b is the slope). Obviously you would need to use the rolled versions of these functions.
Also,
This is going to allocate a lot of memory. Better to do (if you really want to use GLM):
@chain begin
DataFrame(x=x,y=y;copycols=false)
lm(@formula(y ~ 1 + x),df)
coef
(inter=_[1], slope=_[2]) # semi colon not needed here
end