suppose I want to use some specific columns of a dataframe to do some operations and output another number of columns.
In dynamically defining the name of these columns I want to use the name of the source columns.
In the following case I can do it because after fun=> I have the input names available.
inside funcols I have the column data but not the names. I wish I had the names available to use in creating the new names.
I know you can achieve the same result in other ways, but I wanted to know if this is somehow possible as well
I was aware of this possibility. What Iβm missing is understanding how (and why) it is possible to refer (within the same transform) to the names of the input columns after the second β=>β and not also after the first β=>β.
Just to better understand some internal mechanisms of DataFRames, not because I have a particular need for this βfeatureβ.
The reason is that with cols => f => g, DataFrames will call f with the column values, and g with the column names. Itβs just how the API works. If you want to receive the names in the f function you can write AsTable(cols) => f. In this case DataFrames will pass to f the columns as a named tuple (the keys are the column names, the values are the column values). Note that in this case all the columns are passed as a single argument:
using DataFrames
df = DataFrame(a=1:3, b=4:6)
function f(table)
# Add the (first) two columns of table
result = table[1] + table[2]
# Make the name "a+b" from column names "a" and "b"
names = keys(table)
result_name = Symbol(names[1], "+", names[2])
return (; result_name => result)
end
transform(df, AsTable([:a, :b]) => f => AsTable)
# Output:
3Γ3 DataFrame
Row β a b a+b
β Int64 Int64 Int64
ββββββΌβββββββββββββββββββββ
1 β 1 4 5
2 β 2 5 7
3 β 3 6 9
As mentioned in previous posts, I know of other ways to get the result, including passing namedtuples via AsTable.
In this case I would do so
transform(df, [:a, :b] => (+) => x->join(x,"+"))
what I was trying to know is if it was somehow possible to use, even when cols=Array{Symbol}, the column names inside the func and not just in the output naming context.
Since after the second β=>β I can define a function x->secondfun(colsnames) that uses the column names, I was trying to imagine IF and HOW I could use the same information (the names) in the context of the first function.
I know that currently the APIs work the way they do, but I just wanted to poke around behind the scene, without going through the code which is very large and complex.
It seems like we are going in circles. The answer is simply no, the API is not constructed in that way. AsTable only knows about the named-tuple passed to it. It doesnβt know about src in any way in the src => fun => dest expression. There is a way to do what you want, and we have described it above.