How to specify a transformation with multiple arguments via minilanguage

phantom · October 29, 2022, 4:45am

Suppose I have a DataFrame df and GroupDataFrame

df = DataFrame(a=[1,1,2,2], b =[3,3,4,4])
dfg = groupby(df,:a)

further suppose I have a function fun1 that takes a vector x and multiple arguments y, z.

function fun1(x,y,z)
       for i = eachindex(x)
       x[i] = (x[i]+y)^z
       end
       end

Is there a way to pass the fun1 transformation via the
source_column => transformation => target_column_name format? I tried using an anonymous function in parenthesis but it yielded

 transform!(dfg, :a => (x -> fun1(x,1,2)) => :c)
4×3 DataFrame
 Row │ a      b      c       
     │ Int64  Int64  Nothing 
─────┼───────────────────────
   1 │     4      3          
   2 │     4      3          
   3 │     9      4          
   4 │     9      4

as opposed to the desired result of

 Row │ a      b      c       
     │ Int64  Int64  Nothing 
─────┼───────────────────────
   1 │     1      3      4   
   2 │     1      3      4    
   3 │     2      4      9    
   4 │     2      4      9

rocco_sprmnt21 · October 29, 2022, 6:31am

The problem, I think, is the language, not the mini-language.
In the fun1 function there are two things to change:

“export” the result of the calculation;
avoid that the input vector is also modified

PS
on the second point, others, better than I can, could explain what happens and how to act in this case


function fun1(x,y,z)
    out=similar(x)
    for i = eachindex(x)
        out[i] = (x[i]+y)^z
    end
    out
end

bkamins · October 29, 2022, 6:36am

Yes, @rocco_sprmnt21 is correct in both points. It would be simplest to just write:

fun1(x,y,z) = (x .+ y) .^ z

and all would be good.

phantom · October 31, 2022, 8:43am

Thanks so much this clears up a lot! I think that with the simple fun1, transform! can easily be modified to acommodate a y that is in a different column but the same row of dfg with something like

transform!(dfg, [:a, :b] => ((x,y) -> fun1(x,y,2)) => :newcol)

Which yields fun1(dfg[k].a[j], dfg[k].b[j],2) into newcol.

But what if y or z is drawn from a different row of the DataFrame, e.g. if the desired output for newcol is fun1( dfg[k].a[j], dfg[k].b[ j+1 ], 2 ). Can transform with the minilaguage be modified to accommodate operations on different rows or in this case is it better to just modify fun1 or shift col b?

phantom · October 31, 2022, 9:37am

Thanks! Just curious could you explain the distinction between “exporting” the calculation with just

    end
    out
end

verses

    end 
    return out
end

Is the return not necessary here?

rocco_sprmnt21 · October 31, 2022, 2:59pm

Yes, in this case the two forme are equivalent.

Topic		Replies	Views
Output to multiple target columns via transform in DataFrames Minilanguage New to Julia question , dataframes	20	1122	December 18, 2022
Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language New to Julia question , dataframes	22	702	December 23, 2022
Transform in DataFrames General Usage dataframes	13	433	January 21, 2024
Broadcast transformed data from single row to multiple columns General Usage dataframes , dataframesmeta	13	569	December 7, 2022
Dataframes transform! General Usage dataframes	8	1407	July 11, 2022

How to specify a transformation with multiple arguments via minilanguage

Related topics