Here is one method to try (thanks to FlexiJoin package):
julia> using DataFrames, StatsBase, StructArrays, IntervalSets, FlexiJoins
julia> df = DataFrame(gvkey = sample(["APPL","MSFT","GOOG","AMZN"],20),
fyearq=sample([2021,2022],20),
fqtr=sample([1,2,3,4],20),
EPS=round.(rand(20); digits=2))
20Γ4 DataFrame
Row β gvkey fyearq fqtr EPS
β String Int64 Int64 Float64
ββββββΌββββββββββββββββββββββββββββββββ
1 β APPL 2022 4 0.41
2 β APPL 2021 4 0.72
3 β GOOG 2021 1 0.83
4 β MSFT 2021 4 0.48
5 β GOOG 2022 2 0.66
...
julia> transform!(df, [:fyearq, :fqtr] => ByRow((y,q) -> y+(q-1)*0.25) => :yq)
20Γ5 DataFrame
Row β gvkey fyearq fqtr EPS yq
β String Int64 Int64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββ 1 β APPL 2022 4 0.41 2022.75
2 β APPL 2021 4 0.72 2021.75
3 β GOOG 2021 1 0.83 2021.0
4 β MSFT 2021 4 0.48 2021.75
5 β GOOG 2022 2 0.66 2022.25
...
julia> sa = StructArray(NamedTuple.(eachrow(df)))
20-element StructArray(::Vector{String}, ::Vector{Int64}...:
(gvkey = "APPL", fyearq = 2022, fqtr = 4, EPS = 0.41, yq = 2022.75)
(gvkey = "APPL", fyearq = 2021, fqtr = 4, EPS = 0.72, yq = 2021.75)
(gvkey = "GOOG", fyearq = 2021, fqtr = 1, EPS = 0.83, yq = 2021.0)
(gvkey = "MSFT", fyearq = 2021, fqtr = 4, EPS = 0.48, yq = 2021.75)
(gvkey = "GOOG", fyearq = 2022, fqtr = 2, EPS = 0.66, yq = 2022.25)
...
julia> dfout = DataFrame(map(x->(; x.O..., qtr_cnt=length(x.M), EPS_sum=sum(t->t.EPS,x.M), EPS_ssq=sum(t->t.
EPS^2, x.M)),leftjoin((O=sa, M=sa), by_key(:gvkey) & by_pred(x->((x.yq - 1.76)..(x.yq + 0.01)), β, :yq); groupby=:O)))
20Γ8 DataFrame
Row β gvkey fyearq fqtr EPS yq qtr_cnt EPS_sum EPS_ssq
β String Int64 Int64 Float64 Float64 Int64 Float64 Float64
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β AMZN 2021 3 0.17 2021.5 3 0.76 0.3134
2 β GOOG 2022 2 0.96 2022.25 5 3.4 2.6422
3 β MSFT 2021 4 0.41 2021.75 4 1.93 1.0687
4 β APPL 2021 1 0.33 2021.0 2 0.4 0.1138
5 β APPL 2021 1 0.07 2021.0 2 0.4 0.1138
...
Now, in the final DataFrame, there are the qtr_cnt
, EPS_sum
and EPS_ssq
(sum-of-squares) for the appropriate previous 2 years from each datapoint. This should be enough to calculate the standard deviation.
Note, a key difficulty in these questions is getting some data to play with. Had the question included a pasteble generation of a bit of data, Iβm sure it would be a huge help for readers.