Dataframe transform operation on multiple columns

danicaratelli · August 7, 2020, 11:14pm

Hello,

I am trying to use transform on multiple columns of a dataframe, that is I want apply an operation taking 2 variables as inputs, column1 and column2 and output the result.

A simple example would be:

df =  DataFrame(A = 1:4, B = 1:4, C = 5:8);
transform!(df, :, [:B,:C] => (x,y) -> sumcols(x,y))

where sumcols just sums columns together, should give me a column that looks like:

[6;8;10;12];

Of course, in this case I can just do .+, but such a syntax would be useful in many scenarios. Any ideas how I can get this?

Thanks!

tbeason · August 7, 2020, 11:30pm

transform!(df, :, [:B,:C] => ByRow(+) => :newcolname)

danicaratelli · August 7, 2020, 11:38pm

Thank you, this works for what I ask originally. However, is there something that works more generally for any function fun(:B,:C) that outputs a vector of the appropriate size?

For example, say

function f(x,y)
    N = size(x,1);
    z = zeros(N);
    for i=1:N
          z[i] = maximum(x[i:end]) + minimum(y[i:end]);
    end
    return z
end

with the output:
[9;10;11;12];

Thanks!

Henrique_Becker · August 7, 2020, 11:39pm

Interesting, I did not know ByRow I was using a little more generic function I need in other contexts too.

broadwrap(f) = function (args...) broadcast(f, args...) end
transform!(df, [:B,:C] => broadwrap(+) => :D)

danicaratelli · August 7, 2020, 11:54pm

Yes, I get a similar limitation with your method as the one I mention in the previous response.

Thank you though!

tbeason · August 8, 2020, 12:00am

In that example you could do

transform!(df, :, [:B,:C] => (x,y) -> f(x,y))

and it will work. If the function works on vectors, you use this, if it is something you want to broadcast then you use ByRow.

tbeason · August 8, 2020, 12:02am

In fact, just for completeness, I’ll note that you can even be lazy in your original example and just do

transform!(df, :, [:B,:C] => +)

because + also works for vectors.

Henrique_Becker · August 8, 2020, 12:06am

I have to admit that I am not understanding what limitation you are referring to.

danicaratelli · August 8, 2020, 12:06am

This is perfect! Thanks!

danicaratelli · August 8, 2020, 12:07am

Yes, sorry, was not clear. Yours, at least as I applied it, was for row-by-row operations, what @tbeason has here:

transform!(df, :, [:B,:C] => (x,y) -> f(x,y))

works for what I intended.

Thanks!

Henrique_Becker · August 8, 2020, 12:18am

…

Yes. My broadwrap function was for row-by-row operations, this is because the transform! method already works out-of-the-box for operations directly over the vectors (instead of row-by-row).
You do not need to do transform!(df, :, [:B,:C] => (x,y) -> f(x,y)), you can just transform!(df, [:B,:C] => f), f is already a function, and I am not sure why the colon would be needed.

Topic		Replies	Views
Broadcast transformed data from single row to multiple columns General Usage dataframes , dataframesmeta	13	569	December 7, 2022
Transform multiple columns of a DataFrame using the same function Data dataframes	12	4198	January 23, 2023
Apply a column of anonymous functions for each column in a column subset Data dataframes	11	847	April 14, 2022
Dataframes transform! General Usage dataframes	8	1402	July 11, 2022
Transform in DataFrames General Usage dataframes	13	432	January 21, 2024

Dataframe transform operation on multiple columns

Related topics