Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language

rocco_sprmnt21 · December 23, 2022, 10:57am

Here my log, starting from a clean session.
See if you can see any difference.

julia> using DataFrames

julia> df = DataFrame(id=rand(1:3,10), A=rand(11:13,10),
       B=rand(101:103,10), C = rand(25:27,10), D = rand(32:34,10))
10×5 DataFrame
 Row │ id     A      B      C      D     
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     1     13    103     26     32
   2 │     2     13    101     26     33
   3 │     3     13    103     25     32
   4 │     2     11    101     26     32
   5 │     2     12    102     25     34
   6 │     1     11    102     25     34
   7 │     2     11    103     26     32
   8 │     3     13    102     25     32
   9 │     1     12    103     25     33
  10 │     3     11    101     26     32

julia> gdf = groupby(df, :id)
GroupedDataFrame with 3 groups based on key: id
First Group (3 rows): id = 1
 Row │ id     A      B      C      D     
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     1     13    103     26     32
   2 │     1     11    102     25     34
   3 │     1     12    103     25     33
⋮
Last Group (3 rows): id = 3
 Row │ id     A      B      C      D     
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     3     13    103     25     32
   2 │     3     13    102     25     32
   3 │     3     11    101     26     32

julia> function funnynt(id,Cols...; p1=1,p2=2,p3=3)
           m= Matrix{Float64}(undef, length(id),length(Cols))   
           for i in eachindex(Cols)
               if p3>0
                   m[:,i]= (id .+p1).*(Cols[i].+p2*p3)
               else
                   m[:,i]= (id .+p1).*(Cols[i].+p2)
               end
           end
           #ncols=Symbol.("id_c",1:length(Cols))
           ncols=Symbol.("id_",names(df,Not(:id)))

           return (;zip(ncols,eachcol(m))...)
       end
funnynt (generic function with 1 method)

julia> transform(gdf,Cols(:) => funnynt=>AsTable)
10×9 DataFrame
 Row │ id     A      B      C      D      id_A     id_B     id_C     id_D ⋯
     │ Int64  Int64  Int64  Int64  Int64  Float64  Float64  Float64  Floa ⋯
─────┼─────────────────────────────────────────────────────────────────────
   1 │     1     13    103     26     32     38.0    218.0     64.0     7 ⋯
   2 │     2     13    101     26     33     57.0    321.0     96.0    11  
   3 │     3     13    103     25     32     76.0    436.0    124.0    15  
   4 │     2     11    101     26     32     51.0    321.0     96.0    11  
   5 │     2     12    102     25     34     54.0    324.0     93.0    12 ⋯
   6 │     1     11    102     25     34     34.0    216.0     62.0     8  
   7 │     2     11    103     26     32     51.0    327.0     96.0    11  
   8 │     3     13    102     25     32     76.0    432.0    124.0    15  
   9 │     1     12    103     25     33     36.0    218.0     62.0     7 ⋯
  10 │     3     11    101     26     32     68.0    428.0    128.0    15  
                                                           1 column omitted

julia> transform(gdf,Cols(:) =>((x...)->funnynt(x...,p1=0;p3=0))=>AsTable) 
10×9 DataFrame
 Row │ id     A      B      C      D      id_A     id_B     id_C     id_D ⋯
     │ Int64  Int64  Int64  Int64  Int64  Float64  Float64  Float64  Floa ⋯
─────┼─────────────────────────────────────────────────────────────────────
   1 │     1     13    103     26     32     15.0    105.0     28.0     3 ⋯
   2 │     2     13    101     26     33     30.0    206.0     56.0     7  
   3 │     3     13    103     25     32     45.0    315.0     81.0    10  
   4 │     2     11    101     26     32     26.0    206.0     56.0     6  
   5 │     2     12    102     25     34     28.0    208.0     54.0     7 ⋯
   6 │     1     11    102     25     34     13.0    104.0     27.0     3  
   7 │     2     11    103     26     32     26.0    210.0     56.0     6  
   8 │     3     13    102     25     32     45.0    312.0     81.0    10  
   9 │     1     12    103     25     33     14.0    105.0     27.0     3 ⋯
  10 │     3     11    101     26     32     39.0    309.0     84.0    10  
                                                           1 column omitted

As for

I don’t know if I’m able to clarify what actually happens behind the scene.
I go a bit for analogies in my experience with Julia (I don’t read much documentation, I must confess).


nt=(f1=1,f2=2,f3=3)   # basic definition of a named tuple

nt=(;(f1=1,f2=2,f3=3)...) # the function of the splatting operator is to "open" the collection


# If you need to define a namedtuple dynamically, you need to follow these steps




fields=Symbol.('f',1:3)   # a sequence of symbols

# and

values=[1,2,3] # a sequence of values

# below various ways to input it to the syntax (;kwargs...)
# which should be treated by the parser (?) in the same way as non-positional parameters of functions!?
# But here perhaps it is better for someone who knows about these things to be more specific.
(;zip(fields,values)...)
(;Pair.(fields,values)...)
(;tuple.(fields,values)...)

Topic		Replies	Views
Output to multiple target columns via transform in DataFrames Minilanguage New to Julia question , dataframes	20	1031	December 18, 2022
Way to return multiple columns after applying combine-groupby transformation New to Julia question , dataframes	15	685	February 27, 2024
Transform in DataFrames General Usage dataframes	13	397	January 21, 2024
How to specify a transformation with multiple arguments via minilanguage General Usage question , dataframes	5	258	October 31, 2022
Using DataFrames `combine` is there a way to programmatically pass multiple functions to apply to the same same column? Data	9	793	January 20, 2023

Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language

Related topics