Hello,
I am trying to use transform
on multiple columns of a dataframe, that is I want apply an operation taking 2 variables as inputs, column1 and column2 and output the result.
A simple example would be:
df = DataFrame(A = 1:4, B = 1:4, C = 5:8);
transform!(df, :, [:B,:C] => (x,y) -> sumcols(x,y))
where sumcols
just sums columns together, should give me a column that looks like:
[6;8;10;12];
Of course, in this case I can just do .+
, but such a syntax would be useful in many scenarios. Any ideas how I can get this?
Thanks!
1 Like
transform!(df, :, [:B,:C] => ByRow(+) => :newcolname)
Thank you, this works for what I ask originally. However, is there something that works more generally for any function fun(:B,:C) that outputs a vector of the appropriate size?
For example, say
function f(x,y)
N = size(x,1);
z = zeros(N);
for i=1:N
z[i] = maximum(x[i:end]) + minimum(y[i:end]);
end
return z
end
with the output:
[9;10;11;12];
Thanks!
Interesting, I did not know ByRow
I was using a little more generic function I need in other contexts too.
broadwrap(f) = function (args...) broadcast(f, args...) end
transform!(df, [:B,:C] => broadwrap(+) => :D)
1 Like
Yes, I get a similar limitation with your method as the one I mention in the previous response.
Thank you though!
In that example you could do
transform!(df, :, [:B,:C] => (x,y) -> f(x,y))
and it will work. If the function works on vectors, you use this, if it is something you want to broadcast then you use ByRow
.
2 Likes
In fact, just for completeness, I’ll note that you can even be lazy in your original example and just do
transform!(df, :, [:B,:C] => +)
because +
also works for vectors.
1 Like
I have to admit that I am not understanding what limitation you are referring to.
Yes, sorry, was not clear. Yours, at least as I applied it, was for row-by-row operations, what @tbeason has here:
transform!(df, :, [:B,:C] => (x,y) -> f(x,y))
works for what I intended.
Thanks!