How to reference new variables in a combine for dataframes?

Posted this on slack, but I think I’ll move over to using discourse more (I’ll post a cross link)

when doing split-apply-combine with a DataFrame is there anyway for the apply transformations to reference previously calculated variables?

df = DataFrame(a=repeat([1,2],inner=[5]), b='A':'J', c=rand(10))
gdf = groupby(df, :a)
combine(gdf, nrow => :n, :c => mean => :m, [:n, :m] => ((n,m)->n-m) => :y) #doesn't work :n and :m are not visible here

Using Query.jl I can do this, but i think its quite verbose…any suggestions?

df |> @groupby({_.a}) |> 
@map({a=key(_)[1], n=length(_), m=mean(_.c)}) |> 
@map({a=_.a, n=_.n, m=_.m, y=_.n-_.m})

Maybe this one generalizes a bit better

@from i in df begin
    @group i by i.a into g
    @let n=length(g)
    @let m=mean(g.c)
    @select {a=key(g), n=n, m=m, y=n-m}
    @collect DataFrame
end

No this is not possible currently (I thought it was…). You are going to have to use two transform statements. Note that you can make the second one a transform! which will reduce the memory footprint a bit.

1 Like

I ended up using the approach taken here: