I am trying to organize a pipeline for data processing using the @chain
macro. However, I have a problem with using @byrow
macro inside this pipeline. I get an error saying that @byrow
is deprecated outside of DataFramesMeta macros.
So, what is the idiomatic way around?
Two solutions depending on what kind of output you want.
- Maybe you want
@eachrow
? @eachrow
will return a data frame. It applies an operation to each row of a data frame, potentially altering the contents of the rows. There is also a limited ability to create new columns inside @eachrow
.
julia> df = DataFrame(a = [1, 2, 3, 4], b = ["a", "b", "c", "d"]);
julia> @eachrow df begin
:a = 2 * :a
end
4×2 DataFrame
Row │ a b
│ Int64 String
─────┼───────────────
1 │ 2 a
2 │ 4 b
3 │ 6 c
4 │ 8 d
- You can use
@with @byrow ...
. This returns a vector
. Recall that
@with df :a + :b
is akin to
function fun(a, b)
a + b
end
fun(df.a, df.b)
Adding @byrow
just broadcasts fun
across the columns.
@with df @byrow :a + :b
becomes
function fun(a, b)
a + b
end
fun.(df.a, df.b)
Here it is in action
julia> using Statistics
julia> df = DataFrame(a = [1, 2, 3, 4], b = ["a", "b", "c", "d"]);
julia> @with df begin
mean(:a) * 2
end
5.0
julia> @with df @byrow begin
:a + 100
end
4-element Vector{Int64}:
101
102
103
104
This is very funny. I have tried to make a minimal working example to showcase my problem, but this time I did something a little bit differently and this time it worked. Hard to tell what exactly I have changed. There was actually a mistake in the transformation function that I have tried to apply, but the error that I was getting was about @byrow
. But now that I fixed it, I don’t get this error any more.
If you were interested, the MWE for what I was doing lookes like
using DataFrames, DataFramesMeta
df0 = DataFrame(p1 = collect(1:5), p2 = collect(6:10), p3 = [1,0,1,0,1], j=[randn(100) for i=1:5], w = [randn(100) for i=1:5])
@chain df0 begin
@subset :p3.==1
@select :p1 :p2 @byrow :R = 1/dot(:j,:w)
end