I was struggling to find some clean way to code adding a named column to an existing dataframe computed from some of its columns. While I started posting this question I stumbled upon the official docs description of combine/transform and I was quite happy with using transform as posted below. But still, I am wondering if there are other ways to code this operation that are clean and possibly more performant(cpu usage/memory).
Also, does anybody have a good list of nice/terse/clean julia formulas for creating and transforming dataframes? What I am looking for is really a bunch of code that I can try out in the julia repl to get a better understanding of the syntax.
julia> df = DataFrame(X = [1, 2, 3, 4], Y = [0, 1, 2, 4])
4Γ2 DataFrame
Row β X Y
β Int64 Int64
ββββββΌββββββββββββββ
1 β 1 0
2 β 2 1
3 β 3 2
4 β 4 4
julia> add = (a, b) -> a + b
#7 (generic function with 1 method)
julia> transform!(df, :, [:X, :Y] => add => :Z)
4Γ3 DataFrame
Row β X Y Z
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 0 1
2 β 2 1 3
3 β 3 2 5
4 β 4 4 8
You can do df.Z = df.X + df.Y. For more complicated operations, I like Chain.jl plus either DataFrameMacros.jl or DataFramesMeta.jl (both provide similar tools, but the former operates by-row while the latter operates on columns by default). For example,
julia> using DataFrameMacros, Chain
julia> @chain df begin
@transform!(:Z = add(:X, :Y))
end
4Γ3 DataFrame
Row β X Y Z
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 0 1
2 β 2 1 3
3 β 3 2 5
4 β 4 4 8
If you donβt need to chain multiple operations together, you can omit @chain:
Note, DataFramesMeta.jl now exports the macros @rtransform, @rselect, @rsubset, and @rorderby. So it now has feature parity for row-wise operations (with the addition of the letter r).
using DataFramesMeta # also exports Chain.jl
@chain df begin
@rtransform! :Z = :X + :Y
end
Personally I prefer the non-mutating form transform over the mutating transform! and the base DataFrames library rather than macros. Both of these choices simplify my code.
It might be nice to mention the βdeclarativeβ form more prominently in the docs. As far as I can tell, itβs mostly described here as a way to create a DataFrame from scratch. The docstring for transform! is ~1600 words long and applies to four distinct methods, includes many conditional clauses, and has no examples, which may be a bit intimidating for a newcomer. It may be worth including an example in the transform! docs that uses both df.Z = df.X + df.Y and transform!(df, :, [:X, :Y] => + => :Z) to intercept users before they get buried in contemplation of which of the seven allowable forms of args... to apply.