Run multiple instances of transform on specific column combinations of a GroupedDataFrame in DataFrames mini language

The error I get is ArgumentError: duplicate output column name: :E, which is indeed the problem: with AsTable we say that column names are specified in the named tuples produced by funnt. But we produce the same names three times! (one for each of :A, :B and :C).

Summary: If you want to choose the column names in fun, then you must make sure that you produce unique column names.

One solution is to rename the columns after fun, but before you give them to transform:

using DataFrames

df = DataFrame(id=1:3, A=11:13, B=101:103, C=25:27);
gdf = groupby(df, :id)

function fun(Col1, Col2, p1, p2, p3)
   (E,F,G) = (Vector{Float64}(undef, size(Col1,1)) for _ in 1:3)  
    for i in eachindex(Col1)
       if p3>0
           E[i] = p1*Col1[i]+p2*Col2[i]
           F[i] = (p1+Col1[i])*(p2+Col2[i])
           G[i] = p2*Col1[i]+p1*Col2[i]
       end
    end
    return (; E, F, G)
end

suffix_fields(nt, suffix) = NamedTuple(Symbol(k, "_", suffix) => v for (k,v) in pairs(nt))

transform(gdf, [[:id, col] => ((x,y) -> suffix_fields(fun(x,y,1,2,3), col)) => AsTable for col in [:A, :B, :C]])

# Output:
3×13 DataFrame
 Row │ id     A      B      C      E_A      F_A      G_A      E_B      F_B      G_B      E_C      F_C      G_C     
     │ Int64  Int64  Int64  Int64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64  Float64 
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     1     11    101     25     23.0     26.0     13.0    203.0    206.0    103.0     51.0     54.0     27.0
   2 │     2     12    102     26     26.0     42.0     16.0    206.0    312.0    106.0     54.0     84.0     30.0
   3 │     3     13    103     27     29.0     60.0     19.0    209.0    420.0    109.0     57.0    116.0     33.0

Another way is to give fun itself another parameter it can use to make unique names:

function fun(Col1, Col2, p1, p2, p3, suffix)
   (E,F,G) = (Vector{Float64}(undef, size(Col1,1)) for _ in 1:3)  

    for i in eachindex(Col1)
       if p3>0
           E[i] = p1*Col1[i]+p2*Col2[i]
           F[i] = (p1+Col1[i])*(p2+Col2[i])
           G[i] = p2*Col1[i]+p1*Col2[i]
       end
    end
    return (;
        Symbol("E_$suffix") => E,
        Symbol("F_$suffix") => F,
        Symbol("G_$suffix") => G,
        )
end

transform(gdf, [[:id, col] => ((x,y) -> fun(x,y,1,2,3,col)) => AsTable for col in [:A, :B, :C]])

By the way, it’s easier to help you if you always show self-contained examples like the first one here, so that people can try your code with a simple copy-and-paste (rather than having to collect pieces from several messages in the thread).

1 Like