Here my log, starting from a clean session.
See if you can see any difference.
julia> using DataFrames
julia> df = DataFrame(id=rand(1:3,10), A=rand(11:13,10),
B=rand(101:103,10), C = rand(25:27,10), D = rand(32:34,10))
10×5 DataFrame
Row │ id A B C D
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 13 103 26 32
2 │ 2 13 101 26 33
3 │ 3 13 103 25 32
4 │ 2 11 101 26 32
5 │ 2 12 102 25 34
6 │ 1 11 102 25 34
7 │ 2 11 103 26 32
8 │ 3 13 102 25 32
9 │ 1 12 103 25 33
10 │ 3 11 101 26 32
julia> gdf = groupby(df, :id)
GroupedDataFrame with 3 groups based on key: id
First Group (3 rows): id = 1
Row │ id A B C D
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 1 13 103 26 32
2 │ 1 11 102 25 34
3 │ 1 12 103 25 33
⋮
Last Group (3 rows): id = 3
Row │ id A B C D
│ Int64 Int64 Int64 Int64 Int64
─────┼───────────────────────────────────
1 │ 3 13 103 25 32
2 │ 3 13 102 25 32
3 │ 3 11 101 26 32
julia> function funnynt(id,Cols...; p1=1,p2=2,p3=3)
m= Matrix{Float64}(undef, length(id),length(Cols))
for i in eachindex(Cols)
if p3>0
m[:,i]= (id .+p1).*(Cols[i].+p2*p3)
else
m[:,i]= (id .+p1).*(Cols[i].+p2)
end
end
#ncols=Symbol.("id_c",1:length(Cols))
ncols=Symbol.("id_",names(df,Not(:id)))
return (;zip(ncols,eachcol(m))...)
end
funnynt (generic function with 1 method)
julia> transform(gdf,Cols(:) => funnynt=>AsTable)
10×9 DataFrame
Row │ id A B C D id_A id_B id_C id_D ⋯
│ Int64 Int64 Int64 Int64 Int64 Float64 Float64 Float64 Floa ⋯
─────┼─────────────────────────────────────────────────────────────────────
1 │ 1 13 103 26 32 38.0 218.0 64.0 7 ⋯
2 │ 2 13 101 26 33 57.0 321.0 96.0 11
3 │ 3 13 103 25 32 76.0 436.0 124.0 15
4 │ 2 11 101 26 32 51.0 321.0 96.0 11
5 │ 2 12 102 25 34 54.0 324.0 93.0 12 ⋯
6 │ 1 11 102 25 34 34.0 216.0 62.0 8
7 │ 2 11 103 26 32 51.0 327.0 96.0 11
8 │ 3 13 102 25 32 76.0 432.0 124.0 15
9 │ 1 12 103 25 33 36.0 218.0 62.0 7 ⋯
10 │ 3 11 101 26 32 68.0 428.0 128.0 15
1 column omitted
julia> transform(gdf,Cols(:) =>((x...)->funnynt(x...,p1=0;p3=0))=>AsTable)
10×9 DataFrame
Row │ id A B C D id_A id_B id_C id_D ⋯
│ Int64 Int64 Int64 Int64 Int64 Float64 Float64 Float64 Floa ⋯
─────┼─────────────────────────────────────────────────────────────────────
1 │ 1 13 103 26 32 15.0 105.0 28.0 3 ⋯
2 │ 2 13 101 26 33 30.0 206.0 56.0 7
3 │ 3 13 103 25 32 45.0 315.0 81.0 10
4 │ 2 11 101 26 32 26.0 206.0 56.0 6
5 │ 2 12 102 25 34 28.0 208.0 54.0 7 ⋯
6 │ 1 11 102 25 34 13.0 104.0 27.0 3
7 │ 2 11 103 26 32 26.0 210.0 56.0 6
8 │ 3 13 102 25 32 45.0 312.0 81.0 10
9 │ 1 12 103 25 33 14.0 105.0 27.0 3 ⋯
10 │ 3 11 101 26 32 39.0 309.0 84.0 10
1 column omitted
As for
I don’t know if I’m able to clarify what actually happens behind the scene.
I go a bit for analogies in my experience with Julia (I don’t read much documentation, I must confess).
nt=(f1=1,f2=2,f3=3) # basic definition of a named tuple
nt=(;(f1=1,f2=2,f3=3)...) # the function of the splatting operator is to "open" the collection
# If you need to define a namedtuple dynamically, you need to follow these steps
fields=Symbol.('f',1:3) # a sequence of symbols
# and
values=[1,2,3] # a sequence of values
# below various ways to input it to the syntax (;kwargs...)
# which should be treated by the parser (?) in the same way as non-positional parameters of functions!?
# But here perhaps it is better for someone who knows about these things to be more specific.
(;zip(fields,values)...)
(;Pair.(fields,values)...)
(;tuple.(fields,values)...)