Dataframe transform operation on multiple columns

Hello,

I am trying to use transform on multiple columns of a dataframe, that is I want apply an operation taking 2 variables as inputs, column1 and column2 and output the result.

A simple example would be:

df =  DataFrame(A = 1:4, B = 1:4, C = 5:8);
transform!(df, :, [:B,:C] => (x,y) -> sumcols(x,y))

where sumcols just sums columns together, should give me a column that looks like:

[6;8;10;12];

Of course, in this case I can just do .+, but such a syntax would be useful in many scenarios. Any ideas how I can get this?

Thanks!

1 Like
transform!(df, :, [:B,:C] => ByRow(+) => :newcolname)

Thank you, this works for what I ask originally. However, is there something that works more generally for any function fun(:B,:C) that outputs a vector of the appropriate size?

For example, say

function f(x,y)
    N = size(x,1);
    z = zeros(N);
    for i=1:N
          z[i] = maximum(x[i:end]) + minimum(y[i:end]);
    end
    return z
end

with the output:
[9;10;11;12];

Thanks!

Interesting, I did not know ByRow I was using a little more generic function I need in other contexts too.

broadwrap(f) = function (args...) broadcast(f, args...) end
transform!(df, [:B,:C] => broadwrap(+) => :D)

Yes, I get a similar limitation with your method as the one I mention in the previous response.

Thank you though!

In that example you could do

transform!(df, :, [:B,:C] => (x,y) -> f(x,y))

and it will work. If the function works on vectors, you use this, if it is something you want to broadcast then you use ByRow.

2 Likes

In fact, just for completeness, I’ll note that you can even be lazy in your original example and just do

transform!(df, :, [:B,:C] => +)

because + also works for vectors.

1 Like

I have to admit that I am not understanding what limitation you are referring to.

This is perfect! Thanks!

Yes, sorry, was not clear. Yours, at least as I applied it, was for row-by-row operations, what @tbeason has here:

transform!(df, :, [:B,:C] => (x,y) -> f(x,y))

works for what I intended.

Thanks!

  1. Yes. My broadwrap function was for row-by-row operations, this is because the transform! method already works out-of-the-box for operations directly over the vectors (instead of row-by-row).
  2. You do not need to do transform!(df, :, [:B,:C] => (x,y) -> f(x,y)), you can just transform!(df, [:B,:C] => f), f is already a function, and I am not sure why the colon would be needed.
1 Like