How to specify a transformation with multiple arguments via minilanguage

Suppose I have a DataFrame df and GroupDataFrame

df = DataFrame(a=[1,1,2,2], b =[3,3,4,4])
dfg = groupby(df,:a)

further suppose I have a function fun1 that takes a vector x and multiple arguments y, z.

function fun1(x,y,z)
       for i = eachindex(x)
       x[i] = (x[i]+y)^z
       end
       end

Is there a way to pass the fun1 transformation via the
source_column => transformation => target_column_name format? I tried using an anonymous function in parenthesis but it yielded

 transform!(dfg, :a => (x -> fun1(x,1,2)) => :c)
4Γ—3 DataFrame
 Row β”‚ a      b      c       
     β”‚ Int64  Int64  Nothing 
─────┼───────────────────────
   1 β”‚     4      3          
   2 β”‚     4      3          
   3 β”‚     9      4          
   4 β”‚     9      4     

as opposed to the desired result of

 Row β”‚ a      b      c       
     β”‚ Int64  Int64  Nothing 
─────┼───────────────────────
   1 β”‚     1      3      4   
   2 β”‚     1      3      4    
   3 β”‚     2      4      9    
   4 β”‚     2      4      9

The problem, I think, is the language, not the mini-language.
In the fun1 function there are two things to change:

  1. β€œexport” the result of the calculation;
  2. avoid that the input vector is also modified

PS
on the second point, others, better than I can, could explain what happens and how to act in this case


function fun1(x,y,z)
    out=similar(x)
    for i = eachindex(x)
        out[i] = (x[i]+y)^z
    end
    out
end
2 Likes

Yes, @rocco_sprmnt21 is correct in both points. It would be simplest to just write:

fun1(x,y,z) = (x .+ y) .^ z

and all would be good.

2 Likes

Thanks so much this clears up a lot! I think that with the simple fun1, transform! can easily be modified to acommodate a y that is in a different column but the same row of dfg with something like

transform!(dfg, [:a, :b] => ((x,y) -> fun1(x,y,2)) => :newcol)

Which yields fun1(dfg[k].a[j], dfg[k].b[j],2) into newcol.

But what if y or z is drawn from a different row of the DataFrame, e.g. if the desired output for newcol is fun1( dfg[k].a[j], dfg[k].b[ j+1 ], 2 ). Can transform with the minilaguage be modified to accommodate operations on different rows or in this case is it better to just modify fun1 or shift col b?

Thanks! Just curious could you explain the distinction between β€œexporting” the calculation with just

    end
    out
end

verses

    end 
    return out
end 

Is the return not necessary here?

Yes, in this case the two forme are equivalent.

1 Like