Hi, I’m a little confused on what to do in this coding situation and would appreciate some help!
I’m trying to create new columns in a dataframe (D) as a linear combination of the existing columns in that dataframe, given a vector of coefficients (v).
Basically: If I have a vector of Float64 numbers, v_i, and a 20 column data frame, D, of float64 values, how can I concatenate columns to the data frame D, where I am doing an element wise multiplication:
D[:New] = v_1D_1 + v_2D_2 + v_3D_3 + … v_20D_20 ; where D_i = ith column of D
So far I have started to write the function:
for i in 1:length(v)
D[:New] += v[i]*(D[:column_names_of_D[i]])
end
But I am unsure how to call the D[:column_names_of_D[i]]) part…
I would collect them as a matrix and then use matrix * vector
multiplication, eg
using DataFrames
"""
Return columns that start with `prefix` as a matrix.
"""
function cols2matrix(df::DataFrame, prefix)
matching_names = sort(filter(name -> startswith(String(name), String(prefix)),
names(df)))
hcat(getindex.(df, matching_names)...)
end
# make a dataframe
df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10))
M = cols2matrix(df, "v_")
M * [1, 2, 3]
But I would suggest that storing this kind of data in a matrix may be best in the first place.
Just a note about the hcat
command. Matrix()
works just find on DataFrames and is probably more idiomatic.
I also am gonna work on a package that overloads some useful string commands, or writes wrappers for them, for common string operations that would be used on the names of dataframes columns.
1 Like
Excellent idea, as it allows more modular code:
using DataFrames
colnames_with_prefix(df::DataFrame, prefix) =
sort(filter(name -> startswith(String(name), String(prefix)), names(df)))
df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10),
a = rand(1:20, 10))
Matrix(df[colnames_with_prefix(df, :v_)]) * [1, 2, 3]