Dataframe transform operation on multiple columns

Hello,

I am trying to use transform on multiple columns of a dataframe, that is I want apply an operation taking 2 variables as inputs, column1 and column2 and output the result.

A simple example would be:

df =  DataFrame(A = 1:4, B = 1:4, C = 5:8);
transform!(df, :, [:B,:C] => (x,y) -> sumcols(x,y))

where sumcols just sums columns together, should give me a column that looks like:

[6;8;10;12];

Of course, in this case I can just do .+, but such a syntax would be useful in many scenarios. Any ideas how I can get this?

Thanks!

transform!(df, :, [:B,:C] => ByRow(+) => :newcolname)

Thank you, this works for what I ask originally. However, is there something that works more generally for any function fun(:B,:C) that outputs a vector of the appropriate size?

For example, say

function f(x,y)
    N = size(x,1);
    z = zeros(N);
    for i=1:N
          z[i] = maximum(x[i:end]) + minimum(y[i:end]);
    end
    return z
end

with the output:
[9;10;11;12];

Thanks!

Interesting, I did not know ByRow I was using a little more generic function I need in other contexts too.

broadwrap(f) = function (args...) broadcast(f, args...) end
transform!(df, [:B,:C] => broadwrap(+) => :D)

Yes, I get a similar limitation with your method as the one I mention in the previous response.

Thank you though!

In that example you could do

transform!(df, :, [:B,:C] => (x,y) -> f(x,y))

and it will work. If the function works on vectors, you use this, if it is something you want to broadcast then you use ByRow.

In fact, just for completeness, I’ll note that you can even be lazy in your original example and just do

transform!(df, :, [:B,:C] => +)

because + also works for vectors.

I have to admit that I am not understanding what limitation you are referring to.

This is perfect! Thanks!

Yes, sorry, was not clear. Yours, at least as I applied it, was for row-by-row operations, what @tbeason has here:

transform!(df, :, [:B,:C] => (x,y) -> f(x,y))

works for what I intended.

Thanks!

  1. Yes. My broadwrap function was for row-by-row operations, this is because the transform! method already works out-of-the-box for operations directly over the vectors (instead of row-by-row).
  2. You do not need to do transform!(df, :, [:B,:C] => (x,y) -> f(x,y)), you can just transform!(df, [:B,:C] => f), f is already a function, and I am not sure why the colon would be needed.