Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language

Thank you so much! Sorry I would have liked to mark multiple solutions but wasnโ€™t sure how to do it. If you have a moment could you clarify how the zip and ... commands are working here?

When I add the splat ... operator after the Cols argument in the function declaration does that turn that argument in the function into an array? So when I input Cols... into funnynt it would be in the form [col1,col2.col3]

Also I am not sure what I am getting wrong here. But when I try

df  = DataFrame(id=rand(1:3,10), A=rand(11:13,10),       
              B=rand(101:103,10), C = rand(25:27,10), D = rand(32:34,10))
10ร—5 DataFrame
 Row โ”‚ id     A      B      C      D     
     โ”‚ Int64  Int64  Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1     12    102     26     32
   2 โ”‚     1     13    101     25     34
   3 โ”‚     2     13    102     26     33
   4 โ”‚     3     11    102     27     33
   5 โ”‚     1     13    102     26     33
   6 โ”‚     3     11    101     25     32
   7 โ”‚     1     11    103     26     34
   8 โ”‚     2     12    101     27     34
   9 โ”‚     2     12    102     25     32
  10 โ”‚     2     12    101     25     32
gdf = groupby(df,:id)
GroupedDataFrame with 3 groups based on key: id
First Group (4 rows): id = 1
 Row โ”‚ id     A      B      C      D     
     โ”‚ Int64  Int64  Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1     12    102     26     32
   2 โ”‚     1     13    101     25     34
   3 โ”‚     1     13    102     26     33
   4 โ”‚     1     11    103     26     34
โ‹ฎ
Last Group (2 rows): id = 3
 Row โ”‚ id     A      B      C      D     
     โ”‚ Int64  Int64  Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3     11    102     27     33
   2 โ”‚     3     11    101     25     32
function funnynt(id,Cols...; p1=1,p2=2,p3=3)
                  m= Matrix{Float64}(undef, length(id),length(Cols))
                  for i in eachindex(Cols)
                      if p3>0
                          m[:,i]= (id .+p1).*(Cols[i].+p2*p3)
                      else
                          m[:,i]= (id .+p1).*(Cols[i].+p2)
                      end
                  end
                  ncols=Symbol.("id_",names(df,Not(:id)))
                  return (;zip(ncols,eachcol(m))...)
              end
funnynt (generic function with 1 method)

transform(gdf,Cols(:) => funnynt => AsTable)

I get

ERROR: NamedTuple names and field types must have matching lengths

Again I apologize if I am overlooking something elementary, but thanks so much for your time and help.

Here my log, starting from a clean session.
See if you can see any difference.

julia> using DataFrames

julia> df = DataFrame(id=rand(1:3,10), A=rand(11:13,10),
       B=rand(101:103,10), C = rand(25:27,10), D = rand(32:34,10))
10ร—5 DataFrame
 Row โ”‚ id     A      B      C      D     
     โ”‚ Int64  Int64  Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1     13    103     26     32
   2 โ”‚     2     13    101     26     33
   3 โ”‚     3     13    103     25     32
   4 โ”‚     2     11    101     26     32
   5 โ”‚     2     12    102     25     34
   6 โ”‚     1     11    102     25     34
   7 โ”‚     2     11    103     26     32
   8 โ”‚     3     13    102     25     32
   9 โ”‚     1     12    103     25     33
  10 โ”‚     3     11    101     26     32

julia> gdf = groupby(df, :id)
GroupedDataFrame with 3 groups based on key: id
First Group (3 rows): id = 1
 Row โ”‚ id     A      B      C      D     
     โ”‚ Int64  Int64  Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1     13    103     26     32
   2 โ”‚     1     11    102     25     34
   3 โ”‚     1     12    103     25     33
โ‹ฎ
Last Group (3 rows): id = 3
 Row โ”‚ id     A      B      C      D     
     โ”‚ Int64  Int64  Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     3     13    103     25     32
   2 โ”‚     3     13    102     25     32
   3 โ”‚     3     11    101     26     32

julia> function funnynt(id,Cols...; p1=1,p2=2,p3=3)
           m= Matrix{Float64}(undef, length(id),length(Cols))   
           for i in eachindex(Cols)
               if p3>0
                   m[:,i]= (id .+p1).*(Cols[i].+p2*p3)
               else
                   m[:,i]= (id .+p1).*(Cols[i].+p2)
               end
           end
           #ncols=Symbol.("id_c",1:length(Cols))
           ncols=Symbol.("id_",names(df,Not(:id)))

           return (;zip(ncols,eachcol(m))...)
       end
funnynt (generic function with 1 method)

julia> transform(gdf,Cols(:) => funnynt=>AsTable)
10ร—9 DataFrame
 Row โ”‚ id     A      B      C      D      id_A     id_B     id_C     id_D โ‹ฏ
     โ”‚ Int64  Int64  Int64  Int64  Int64  Float64  Float64  Float64  Floa โ‹ฏ
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1     13    103     26     32     38.0    218.0     64.0     7 โ‹ฏ
   2 โ”‚     2     13    101     26     33     57.0    321.0     96.0    11  
   3 โ”‚     3     13    103     25     32     76.0    436.0    124.0    15  
   4 โ”‚     2     11    101     26     32     51.0    321.0     96.0    11  
   5 โ”‚     2     12    102     25     34     54.0    324.0     93.0    12 โ‹ฏ
   6 โ”‚     1     11    102     25     34     34.0    216.0     62.0     8  
   7 โ”‚     2     11    103     26     32     51.0    327.0     96.0    11  
   8 โ”‚     3     13    102     25     32     76.0    432.0    124.0    15  
   9 โ”‚     1     12    103     25     33     36.0    218.0     62.0     7 โ‹ฏ
  10 โ”‚     3     11    101     26     32     68.0    428.0    128.0    15  
                                                           1 column omitted

julia> transform(gdf,Cols(:) =>((x...)->funnynt(x...,p1=0;p3=0))=>AsTable) 
10ร—9 DataFrame
 Row โ”‚ id     A      B      C      D      id_A     id_B     id_C     id_D โ‹ฏ
     โ”‚ Int64  Int64  Int64  Int64  Int64  Float64  Float64  Float64  Floa โ‹ฏ
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1     13    103     26     32     15.0    105.0     28.0     3 โ‹ฏ
   2 โ”‚     2     13    101     26     33     30.0    206.0     56.0     7  
   3 โ”‚     3     13    103     25     32     45.0    315.0     81.0    10  
   4 โ”‚     2     11    101     26     32     26.0    206.0     56.0     6  
   5 โ”‚     2     12    102     25     34     28.0    208.0     54.0     7 โ‹ฏ
   6 โ”‚     1     11    102     25     34     13.0    104.0     27.0     3  
   7 โ”‚     2     11    103     26     32     26.0    210.0     56.0     6  
   8 โ”‚     3     13    102     25     32     45.0    312.0     81.0    10  
   9 โ”‚     1     12    103     25     33     14.0    105.0     27.0     3 โ‹ฏ
  10 โ”‚     3     11    101     26     32     39.0    309.0     84.0    10  
                                                           1 column omitted

As for

I donโ€™t know if Iโ€™m able to clarify what actually happens behind the scene.
I go a bit for analogies in my experience with Julia (I donโ€™t read much documentation, I must confess).


nt=(f1=1,f2=2,f3=3)   # basic definition of a named tuple

nt=(;(f1=1,f2=2,f3=3)...) # the function of the splatting operator is to "open" the collection


# If you need to define a namedtuple dynamically, you need to follow these steps




fields=Symbol.('f',1:3)   # a sequence of symbols

# and

values=[1,2,3] # a sequence of values

# below various ways to input it to the syntax (;kwargs...)
# which should be treated by the parser (?) in the same way as non-positional parameters of functions!?
# But here perhaps it is better for someone who knows about these things to be more specific.
(;zip(fields,values)...)
(;Pair.(fields,values)...)
(;tuple.(fields,values)...)
1 Like

To expand a bit on @rocco_sprmnt21โ€™s answer:

Itโ€™s a syntax for declaring named tuples. Itโ€™s not very well documented so Iโ€™ve submitted a PR.

Basically, you can write (; :a => 1, :b => 2) to create the named tuple as if you had written (a=1, b=2). The first form is more flexible: you can generate the name programmatically (as done above with Symbol("E_$suffix")), and if you want a tuple field to have the same name and same value as a variable, you donโ€™t have to repeat yourself: (; a=a) can be shortened to (; a). I used this above in (; E, F, G).

Itโ€™s easy to try yourself to see what Julia does, for example:

julia> f(x...) = x;

julia> f(1,2,"text")
(1, 2, "text")

so if a function is declared with x, in the body the value of x is a tuple with all the arguments passed to the function (note that for keyword arguments itโ€™s different, you get a special object that also holds the argument names).

The description above is for the case where ... is used in the function declaration, as in your funnynt(id, Cols...; p1=1, p2=2, p3=3). In the case of (; zip(ncols,eachcol(m))...) itโ€™s different: here the splatting operator is not in a declaration, itโ€™s more like a call. See the FAQ.

2 Likes