Hello!
What is right way to transform an dataframe in place with a function which return multiple columns?
I want to write it like
transform!(df, :A=>(x->f(x))=>[:B,:C])
but it doesnt work.
Hello!
What is right way to transform an dataframe in place with a function which return multiple columns?
I want to write it like
transform!(df, :A=>(x->f(x))=>[:B,:C])
but it doesnt work.
Edit If the function returns a vector of vectors, it is interpreted as a vector of rows, not columns. One solution is to return a matrix instead. Notice the space instead of a comma in x->[x.+1 x.^2]
julia> df
3Γ1 DataFrame
Row β x
β Int64
ββββββΌβββββββ
1 β 1
2 β 2
3 β 3
julia> transform!(df, :x=>(x->[x.+1 x.^2]) => [:y, :z])
3Γ3 DataFrame
Row β x y z
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 2 1
2 β 2 3 4
3 β 3 4 9
Probably the one proposed by @skleinbo is the most typical way, but there are other ways
transform!(df, :x=>ByRow(r->(y=r+1, z=r^2))=>AsTable)
hcat(df,DataFrame(y=df.x.+1, z=df.x.^2))
insertcols!(df,2, :y=>df.x.+1, :z=>df.x.^2)
The function can return βvector of vectorsβ, but they are interpreted as rows:
julia> df = DataFrame(A=1:3)
3Γ1 DataFrame
Row β A
β Int64
ββββββΌβββββββ
1 β 1
2 β 2
3 β 3
julia> transform!(df, :A => (x -> [[v+1, v+2] for v in x]) => [:B, :C])
3Γ3 DataFrame
Row β A B C
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 2 3
2 β 2 3 4
3 β 3 4 5
The general format of expected output with multiple output columns is:
I take this opportunity to ask for some more details on the choices made regarding the possible outputs of βfunβ.
#this
transform!(df, :x => (x -> [(v+1, v+2) for v in x]) => [:B, :C])
# is equivalent to this
transform!(df, :x => (x -> [(B=v+1, C=v+2) for v in x]) => AsTable)
my question is why this (array of namedtuples) works
transform!(df, :x=>ByRow(r->(y=r+1, z=r^2))=>AsTable)
and this (namedtuple of arrays) not
transform!(df, :x=>r->(y = r.+1, z = r.^2)=>AsTable)
It works, you just have forgotten parentheses:
julia> df = DataFrame(x=1:3)
3Γ1 DataFrame
Row β x
β Int64
ββββββΌβββββββ
1 β 1
2 β 2
3 β 3
julia> transform!(df, :x => (r->(y = r.+1, z = r.^2)) => AsTable)
3Γ3 DataFrame
Row β x y z
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 2 1
2 β 2 3 4
3 β 3 4 9
I tried to compare the various ways, but they all seem equivalent.
I couldnβt figure out how insertcols performs, as @btime fails on the second pass because it already finds the columns with the same name.
Has the option in the insertcols function been evaluated to overwrite an existing column?
If so, why was it discarded?
use initialization code for @btime
.
This is the point of insertcols!
that it should error in this case. If you want to overwrite an existing column use setindex!
or setproperty!
(i.e. just write df.col = vector
or df[!, col] = vector
).
or pass makeunique=true
in insertcols!
.