Posted this on slack, but I think I’ll move over to using discourse more (I’ll post a cross link)
when doing split-apply-combine with a DataFrame is there anyway for the apply transformations to reference previously calculated variables?
df = DataFrame(a=repeat([1,2],inner=[5]), b='A':'J', c=rand(10))
gdf = groupby(df, :a)
combine(gdf, nrow => :n, :c => mean => :m, [:n, :m] => ((n,m)->n-m) => :y) #doesn't work :n and :m are not visible here
Using Query.jl
I can do this, but i think its quite verbose…any suggestions?
df |> @groupby({_.a}) |>
@map({a=key(_)[1], n=length(_), m=mean(_.c)}) |>
@map({a=_.a, n=_.n, m=_.m, y=_.n-_.m})
Maybe this one generalizes a bit better
@from i in df begin
@group i by i.a into g
@let n=length(g)
@let m=mean(g.c)
@select {a=key(g), n=n, m=m, y=n-m}
@collect DataFrame
end
No this is not possible currently (I thought it was…). You are going to have to use two transform statements. Note that you can make the second one a transform!
which will reduce the memory footprint a bit.
1 Like
I ended up using the approach taken here:
Whoa cool. So what does the _ _ refer to on the @mapmany line? It looks like it’s the original dataframe somehow, but how does it know that that’s what you’re referring to?