Iβm stumped this morning on what I thought would be a fairly trivial task: I am trying to write a function that creates a new column in a DataFrame, the values of which are simply the concatenation of some other columns in the DataFrame. If I know the columns upfront, it is indeed a very trivial task. However, I want this function to take a DataFrame as one argument and then the column numbers to concatenate as anotherβ¦something like this:
function add_column(df::DataFrame, colname::Symbol, cols_to_concat::Vector{Integer})
df[!,colname] = df[!, 1] .* df[!, 2] .* df[!, 3]
end
# OR
function add_column(df::DataFrame, colname::Symbol, cols_to_concat::Vector{Integer})
df[!,colname] = ["$(row[cols_to_concat[1]])$(row[cols_to_concat[2]])" for row in eachrow(df)]
end
Iβm thinking there may be a metaprogramming solution to this? It seems to me that the issue is figuring out how to dynamically construct this piece df[!, 1] .* df[!, 2] .* df[!, 3] or the alternative version in the second function. I think I can dynamically construct the df[!, n] pieces fairly easily, like this:
dfcols = [Symbol("df[!, $n]") for n in cols_to_concat]
but Iβm not sure then how to piece them together with the .* in betweenβ¦
Iβm probably misunderstanding the question, but does this help:
using DataFrames
df = DataFrame(a = ["aa" for _ in 1:10], b = ["bb" for _ in 1:10],
c = ["cc" for _ in 1:10], d = ["dd" for _ in 1:10])
function add_column(df::DataFrame, colname::Symbol, cols_to_concat::Vector)
df[!,colname] = (df[!, names(df)[cols_to_concat[1]]]
.* df[!, names(df)[cols_to_concat[2]]]
.* df[!, names(df)[cols_to_concat[3]]])
end
add_column(df, :add, [1, 4, 4])
Output:
10Γ5 DataFrame
β Row β a β b β c β d β add β
β β String β String β String β String β String β
βββββββΌβββββββββΌβββββββββΌβββββββββΌβββββββββΌβββββββββ€
β 1 β aa β bb β cc β dd β aadddd β
β 2 β aa β bb β cc β dd β aadddd β
β 3 β aa β bb β cc β dd β aadddd β
β 4 β aa β bb β cc β dd β aadddd β
β 5 β aa β bb β cc β dd β aadddd β
β 6 β aa β bb β cc β dd β aadddd β
β 7 β aa β bb β cc β dd β aadddd β
β 8 β aa β bb β cc β dd β aadddd β
β 9 β aa β bb β cc β dd β aadddd β
β 10 β aa β bb β cc β dd β aadddd β
Hi! I need this to work though when there are more or less than 3 columns. Your function will concatenate the first three columns that are given in the cols_to_concat vector, but I need to write the function so that the cols_to_concat vector could contain 1, 3, 12, etc., columns that need to be concatenatedβ¦
I tried this, but it doesnβt work:
function add_column(df::DataFrame, colname::Symbol, cols_to_concat::Vector{Int64})
dfcols = [i == length(cols_to_concat) ? Symbol("df[!, $n]") : Symbol("df[!, $n] .* ") for (i, n) in enumerate(cols_to_concat)]
left_side = Symbol("$(df).$(colname)")
right_side = Symbol(string(dfcols...))
@eval $left_side = $right_side
return df
end
julia> add_column(df1, :new, [1,3,5,6])
ERROR: UndefVarError: df[!, 1] .* df[!, 3] .* df[!, 5] .* df[!, 6] not defined
Stacktrace:
[1] top-level scope at REPL[20]:1
[2] eval at .\boot.jl:330 [inlined]
[3] add_column(::DataFrame, ::Symbol, ::Array{Int64,1}) at .\REPL[18]:5
[4] top-level scope at REPL[20]:1
function add_column!(df::DataFrame, colname::Symbol, cols_to_concat::Vector)
t = df[!, cols_to_concat[1]]
for col in cols_to_concat[2:end]
t = t .* df[!, col]
end
df[!, colname] = t
return df
end
You could also use reduce for this.
function add_column2!(df::DataFrame, colname::Symbol, cols_to_concat::Vector)
function concat(x, y)
x .* y
end
df[!, colname] =reduce(concat, eachcol(df[!, cols_to_concat]))
return df
end
Seriously though, thanks so much for this brilliantly simple solution. Let me see if I can walk through how it actually works:
function add_column!(df::DataFrame, colname::Symbol, cols_to_concat::Vector)
t = df[!, cols_to_concat[1]] #1
for col in cols_to_concat[2:end] #2
t = t .* df[!, col] #3
end #4
df[!, colname] = t #5
return df #6
end
On line 1, we are grabbing column number cols_to_concat[1] (whatever it may be) and assigning it to a variable t. Then, on lines 2 - 4, we are looping through the rest of the columns in cols_to_concat and updating the value of t on each iteration of the loop by concatenating an additional column until we arrive at the end. Finally, on lines 5 and 6 we create the new column in df, assign it the final value of t and return dfβ¦why didnβt I think of that?!
Basically, I was trying to figure out how to do it all at once when there was no need - your way simply concatenates the columns in an iterative fashion
Just a general note here because I think this is a good example for a broader point. If you strip away everything about the text being stored in a DataFrame, dynamic column selection, etc⦠and just imagine that each element was stored in the same vector of strings (a much simpler problem), you would likely arrive immediately at the solution. This is what is nice about Julia, no need to get fancy, even when the problem looks fancy.
text = ["aa"; "bb"; "cc"]
function replconcat(str)
final = str[1]
for i=2:length(str)
final = final .* str[i]
end
final
end
replconcat(text)
So to apply it to your problem just map appropriately, which is to think about doing this for each row of a DataFrame, which is made easier by the fact that the operation does not necessarily have to be done with a loop as @pdeffebach showed.