How to reference new variables in a combine for dataframes?

Posted this on slack, but I think I’ll move over to using discourse more (I’ll post a cross link)

when doing split-apply-combine with a DataFrame is there anyway for the apply transformations to reference previously calculated variables?

df = DataFrame(a=repeat([1,2],inner=[5]), b='A':'J', c=rand(10))
gdf = groupby(df, :a)
combine(gdf, nrow => :n, :c => mean => :m, [:n, :m] => ((n,m)->n-m) => :y) #doesn't work :n and :m are not visible here

Using Query.jl I can do this, but i think its quite verbose…any suggestions?

df |> @groupby({_.a}) |> 
@map({a=key(_)[1], n=length(_), m=mean(_.c)}) |> 
@map({a=_.a, n=_.n, m=_.m, y=_.n-_.m})

Maybe this one generalizes a bit better

@from i in df begin
    @group i by i.a into g
    @let n=length(g)
    @let m=mean(g.c)
    @select {a=key(g), n=n, m=m, y=n-m}
    @collect DataFrame

No this is not possible currently (I thought it was…). You are going to have to use two transform statements. Note that you can make the second one a transform! which will reduce the memory footprint a bit.

1 Like

I ended up using the approach taken here: