RollingFunctions with a variable window width

Applying a rolling function is a special case of joining the dataset to itself. With FlexiJoins.jl:

julia> using StructArrays

# source table
julia> tbl = (i=1:100, a=rand(100), b=repeat([ x for x in 1:10 if isodd(x) ], 20)) |> StructArray
100-element StructArray(::UnitRange{Int64}, ::Vector{Float64}, ::Vector{Int64}) with eltype NamedTuple{(:i, :a, :b), Tuple{Int64, Float64, Int64}}:
 (i = 1, a = 0.09206398653048098, b = 1)
 (i = 2, a = 0.7537127847894282, b = 3)
 (i = 3, a = 0.8240248053711017, b = 5)
 (i = 4, a = 0.11441701169052021, b = 7)
 (i = 5, a = 0.16367689060640156, b = 9)
 (i = 6, a = 0.9349278192831513, b = 1)
 (i = 7, a = 0.9762688379519132, b = 3)
 (i = 8, a = 0.3151435725496834, b = 5)
...

julia> using Statistics, DataPipes, FlexiJoins

julia> @p let
    # join tbl to itself, so that L.i ∈ R.i ± R.b
    innerjoin((L=tbl, R=tbl), by_pred(:i, ∈, x -> x.i ± x.b); groupby=:R)
    # aggregate L.a with std()
    map((;_.R..., a_runstd=std(_.L.a)))
end
100-element StructArray(::Vector{Int64}, ::Vector{Float64}, ::Vector{Int64}, ::Vector{Float64}) with eltype NamedTuple{(:i, :a, :b, :a_runstd), Tuple{Int64, Float64, Int64, Float64}}:
 (i = 1, a = 0.09206398653048098, b = 1, a_runstd = 0.4678563520128315)
 (i = 2, a = 0.7537127847894282, b = 3, a_runstd = 0.3662641250205392)
 (i = 3, a = 0.8240248053711017, b = 5, a_runstd = 0.3861777662491635)
 (i = 4, a = 0.11441701169052021, b = 7, a_runstd = 0.37359812683740595)
 (i = 5, a = 0.16367689060640156, b = 9, a_runstd = 0.35945720455705743)
 (i = 6, a = 0.9349278192831513, b = 1, a_runstd = 0.4576830686002579)
 (i = 7, a = 0.9762688379519132, b = 3, a_runstd = 0.3961279439019275)
 (i = 8, a = 0.3151435725496834, b = 5, a_runstd = 0.357603729530411)

It’s somewhat less efficient than very specialized solutions, but these joins are much easier generalizable to other similar problems.
As you see, this rolling function application is built from general-purpose basic building blocks.

1 Like