Hello,
I am trying to use transform on multiple columns of a dataframe, that is I want apply an operation taking 2 variables as inputs, column1 and column2 and output the result.
A simple example would be:
df = DataFrame(A = 1:4, B = 1:4, C = 5:8);
transform!(df, :, [:B,:C] => (x,y) -> sumcols(x,y))
where sumcols just sums columns together, should give me a column that looks like:
[6;8;10;12];
Of course, in this case I can just do .+, but such a syntax would be useful in many scenarios. Any ideas how I can get this?
Thanks!
transform!(df, :, [:B,:C] => ByRow(+) => :newcolname)
Thank you, this works for what I ask originally. However, is there something that works more generally for any function fun(:B,:C) that outputs a vector of the appropriate size?
For example, say
function f(x,y)
N = size(x,1);
z = zeros(N);
for i=1:N
z[i] = maximum(x[i:end]) + minimum(y[i:end]);
end
return z
end
with the output:
[9;10;11;12];
Thanks!
Interesting, I did not know ByRow I was using a little more generic function I need in other contexts too.
broadwrap(f) = function (args...) broadcast(f, args...) end
transform!(df, [:B,:C] => broadwrap(+) => :D)
Yes, I get a similar limitation with your method as the one I mention in the previous response.
Thank you though!
In that example you could do
transform!(df, :, [:B,:C] => (x,y) -> f(x,y))
and it will work. If the function works on vectors, you use this, if it is something you want to broadcast then you use ByRow.
In fact, just for completeness, I’ll note that you can even be lazy in your original example and just do
transform!(df, :, [:B,:C] => +)
because + also works for vectors.
I have to admit that I am not understanding what limitation you are referring to.
Yes, sorry, was not clear. Yours, at least as I applied it, was for row-by-row operations, what @tbeason has here:
transform!(df, :, [:B,:C] => (x,y) -> f(x,y))
works for what I intended.
Thanks!