How to use @byrow inside @chain pipeline?

Gregstrq · June 4, 2024, 4:16pm

I am trying to organize a pipeline for data processing using the @chain macro. However, I have a problem with using @byrow macro inside this pipeline. I get an error saying that @byrow is deprecated outside of DataFramesMeta macros.

So, what is the idiomatic way around?

pdeffebach · June 4, 2024, 4:53pm

Two solutions depending on what kind of output you want.

Maybe you want @eachrow? @eachrow will return a data frame. It applies an operation to each row of a data frame, potentially altering the contents of the rows. There is also a limited ability to create new columns inside @eachrow.

julia> df = DataFrame(a = [1, 2, 3, 4], b = ["a", "b", "c", "d"]);

julia> @eachrow df begin 
           :a = 2 * :a
       end
4×2 DataFrame
 Row │ a      b      
     │ Int64  String 
─────┼───────────────
   1 │     2  a
   2 │     4  b
   3 │     6  c
   4 │     8  d

You can use @with @byrow .... This returns a vector. Recall that

@with df :a + :b

is akin to

function fun(a, b)
    a + b
end
fun(df.a, df.b)

Adding @byrow just broadcasts fun across the columns.

@with df @byrow :a + :b

becomes

function fun(a, b)
    a + b
end
fun.(df.a, df.b)

Here it is in action

julia> using Statistics

julia> df = DataFrame(a = [1, 2, 3, 4], b = ["a", "b", "c", "d"]);

julia> @with df begin
           mean(:a) * 2
       end
5.0

julia> @with df @byrow begin
           :a + 100
       end
4-element Vector{Int64}:
 101
 102
 103
 104

Gregstrq · June 4, 2024, 5:36pm

This is very funny. I have tried to make a minimal working example to showcase my problem, but this time I did something a little bit differently and this time it worked. Hard to tell what exactly I have changed. There was actually a mistake in the transformation function that I have tried to apply, but the error that I was getting was about @byrow. But now that I fixed it, I don’t get this error any more.

If you were interested, the MWE for what I was doing lookes like

using DataFrames, DataFramesMeta

df0 = DataFrame(p1 = collect(1:5), p2 = collect(6:10), p3 = [1,0,1,0,1], j=[randn(100) for i=1:5], w = [randn(100) for i=1:5])

@chain df0 begin
	@subset :p3.==1
	@select :p1 :p2 @byrow :R = 1/dot(:j,:w)
end

Topic		Replies	Views
[ANN] DataFramesMeta 0.7.0 release Data	0	554	June 17, 2021
Elegant ways to broadcast the same function to each column replacing the original column in DataFrames.jl New to Julia dataframes	9	1090	May 22, 2021
Apply function By Row without re-stating column names General Usage dataframes , functions	36	3494	May 9, 2022
DataFramesMeta questions General Usage dataframes	35	1232	November 10, 2021
Piping DataFrame rows General Usage dataframes	7	1349	November 5, 2020

How to use @byrow inside @chain pipeline?

Related topics