Just a slight variation of what is already present in this very interesting discussion
function f2(t)
x = t.a + t.b
y = t.a - t.b
(;x,y, t...)
end
select(D, AsTable(:)=>ByRow(f2)=>AsTable)
this variant if you need speed
DataFrame(map(f2, Tables.namedtupleiterator(D)))
This works on master, but we have not made a release yet with that update. I wanted to add one more feature, support for keyword arguments, but that is taking longer than I thought.
3 Likes
This is what I love about Julia. Always a new feature I can’t wait to see
1 Like
What about an @rtransform equivalent of => AsTable
So the columns added or altered are given by the items of the named tuple returned by the function.
function myfunc(; a, b, kwargs... )
x = a+b
y = a-b
(;x, y)
end
@chain begin
DataFrame( a=1:2, b=3:4 )
@rtransform :AsTable = myfunc(; AsTable(:)... )
end
That exists! Just do
@rtransform $AsTable = mufunc(...)
It’s insufficiently documented, which is changing in the future. But there is an example in the docs here.
I notice you can’t use $AsTable =
within an @rtransform @astable
block.
This would be a nice feature.
Can you clarify what that would look like? Do you mean merging selected names and the programmatically generated names from $AsTable
? Fwiw I think this would be a very hard feature to implement, so I’m interested to hear your use case and proposal.
In the below example, column B is added directly, then myfunc
with $AsTable
adds columns C and D then column E is added directly.
In my actual code, 5 or 10 columns are added each time. Because $AsTable
doesn’t work within an @ratransform @astable
block I have to do it as 3 seperate @rtransform
blocks.
I’d prefer all the row level operations on the table to be within a single @rtransform
block.
function myfunc(; A, B )
C = A + B
D = A - B
(; C, D)
end
@chain begin
DataFrame( A=1:2 )
@rtransform @astable begin
:B = :A + 1
$AsTable = myfunc(; AsTable... )
:E = :C + :D
end
end
A more extreme idea (probably too difficult). A block similar to @chain but allows for row level operations (as if you are within @rtransform @astable) except where other macros ( e.g. @rsubset, @orderby) are being used.
@ChainWithrTransform DataFrame( A = 1:10 ) begin
:B = mod(:A,3)
:E = :B * 2
@rsubset :B == 1
:F = :B * 2
@orderby :A
$AsTable = myfunc(; AsTable... )
:G = :D * 2
end
Maybe. then this new @chain
-like macro would have to live in DataFramesMeta.jl, rather than Chain.jl.
Another consideration is what :B = mod(:A, 3)
would do. It seems costly to copy a new data frame every time, maybe it could do @rtransform!
behind the scenes?
Can you please file an issue?
has this been released now?
Sorry about that! I was trying to get another feature done before a release, but I should just push it out now. Let me add a NEWS.md
entry and then I will make a release.
Just a thought.
A function within a @chain block applies to the previous line’s output. e.g. the below gives 15.
@chain 1:5 begin
sum
end
If the previous line’s result is a DataFrame and the function’s input and output are both named tuples, could the function be automatically applied at row level to add columns to the DataFrame ?
with example data.
function my_func(; a, b )
c = a + b
d = a - b
(; c, d )
end
DF = DataFrame( a = 1:3, b=4:6 )
Could this
@chain DF begin
my_func
end
be equivalent to this
@chain DF begin
@rtransform $AsTable = my_func(; AsTable... )
end
to produce a DataFrame with columns a, b, c, d.
No, this is impossible. In the expression
@chain DF begin
my_func
end
@chain
, the macro, does not know anything about my_func
and thus can’t produce that complicated expression based only on my_func
.
But I hear your concern that your method needs a lot of typing. I will think about ways to make that easier.
Thanks Peter. much appreciated.
unrelated question -
In Pandas you can select rows using a specified index / key column.
Does Julia have an equivalent ? I haven’t been able to find it.
It would be very nice within a @transform block to specify a dataframe cell as :RowKey.Colum
In the below, :C3 would be 12,13, 14. i.e. the values of :C2 plus the :C1 value when RowKey == :A.
df = DataFrame( KeyCol = [:A,:B,:C], C1 = 1:3, C2 = 11:13 )
setKeyColumn!(df, :KeyCol )
@chain df begin
@transform :C3 = :C2 .+ :A.C1
end