Apply function By Row without re-stating column names

Just a slight variation of what is already present in this very interesting discussion


function f2(t) 
    x = t.a + t.b    
    y = t.a - t.b
    (;x,y, t...)
end


select(D, AsTable(:)=>ByRow(f2)=>AsTable)

this variant if you need speed

DataFrame(map(f2, Tables.namedtupleiterator(D)))

This works on master, but we have not made a release yet with that update. I wanted to add one more feature, support for keyword arguments, but that is taking longer than I thought.

3 Likes

This is what I love about Julia. Always a new feature I can’t wait to see :slight_smile:

1 Like

What about an @rtransform equivalent of => AsTable
So the columns added or altered are given by the items of the named tuple returned by the function.

function myfunc(; a, b, kwargs... )
    x = a+b
    y = a-b
    (;x, y)
end

@chain begin
    DataFrame( a=1:2, b=3:4 )

    @rtransform         :AsTable = myfunc(; AsTable(:)... )
end

That exists! Just do

@rtransform $AsTable = mufunc(...)

It’s insufficiently documented, which is changing in the future. But there is an example in the docs here.

very cool, thank you :slight_smile:

I notice you can’t use $AsTable = within an @rtransform @astable block.

This would be a nice feature.

Can you clarify what that would look like? Do you mean merging selected names and the programmatically generated names from $AsTable? Fwiw I think this would be a very hard feature to implement, so I’m interested to hear your use case and proposal.

In the below example, column B is added directly, then myfunc with $AsTable adds columns C and D then column E is added directly.

In my actual code, 5 or 10 columns are added each time. Because $AsTable doesn’t work within an @ratransform @astable block I have to do it as 3 seperate @rtransform blocks.

I’d prefer all the row level operations on the table to be within a single @rtransform block.

function myfunc(; A, B )
    C = A + B
    D = A - B
    (; C, D)
end

@chain begin
    DataFrame( A=1:2 )

    @rtransform @astable begin
        :B = :A + 1      
        $AsTable = myfunc(; AsTable... )
        :E = :C + :D
    end
end

A more extreme idea (probably too difficult). A block similar to @chain but allows for row level operations (as if you are within @rtransform @astable) except where other macros ( e.g. @rsubset, @orderby) are being used.

@ChainWithrTransform DataFrame( A = 1:10 ) begin

    :B =  mod(:A,3)
    :E =  :B * 2

    @rsubset :B == 1
    
    :F =  :B * 2

    @orderby :A


    $AsTable = myfunc(; AsTable... )

    :G =  :D * 2

end

Maybe. then this new @chain-like macro would have to live in DataFramesMeta.jl, rather than Chain.jl.

Another consideration is what :B = mod(:A, 3) would do. It seems costly to copy a new data frame every time, maybe it could do @rtransform! behind the scenes?

Can you please file an issue?

https://github.com/JuliaData/DataFramesMeta.jl/issues/325

1 Like

has this been released now?

Sorry about that! I was trying to get another feature done before a release, but I should just push it out now. Let me add a NEWS.md entry and then I will make a release.

Just a thought.

A function within a @chain block applies to the previous line’s output. e.g. the below gives 15.

@chain 1:5 begin
     sum
end  

If the previous line’s result is a DataFrame and the function’s input and output are both named tuples, could the function be automatically applied at row level to add columns to the DataFrame ?
with example data.

function my_func(; a, b ) 
        c = a + b
        d = a - b
       (; c, d ) 
end

DF = DataFrame( a = 1:3,   b=4:6  )

Could this

@chain DF begin
       my_func
end

be equivalent to this

@chain DF begin
       @rtransform $AsTable = my_func(; AsTable... )
end

to produce a DataFrame with columns a, b, c, d.

No, this is impossible. In the expression

@chain DF begin
     my_func
end

@chain, the macro, does not know anything about my_func and thus can’t produce that complicated expression based only on my_func.

But I hear your concern that your method needs a lot of typing. I will think about ways to make that easier.

Thanks Peter. much appreciated.

unrelated question -

In Pandas you can select rows using a specified index / key column.
Does Julia have an equivalent ? I haven’t been able to find it.

It would be very nice within a @transform block to specify a dataframe cell as :RowKey.Colum
In the below, :C3 would be 12,13, 14. i.e. the values of :C2 plus the :C1 value when RowKey == :A.

df = DataFrame(    KeyCol = [:A,:B,:C],    C1 = 1:3,    C2 = 11:13   )

setKeyColumn!(df, :KeyCol )


@chain df begin  

    @transform :C3    =     :C2     .+       :A.C1

end