question about creating new columns in data frame from existing columns,

question

#1

Hi, I’m a little confused on what to do in this coding situation and would appreciate some help!

I’m trying to create new columns in a dataframe (D) as a linear combination of the existing columns in that dataframe, given a vector of coefficients (v).

Basically: If I have a vector of Float64 numbers, v_i, and a 20 column data frame, D, of float64 values, how can I concatenate columns to the data frame D, where I am doing an element wise multiplication:
D[:New] = v_1D_1 + v_2D_2 + v_3D_3 + … v_20D_20 ; where D_i = ith column of D

So far I have started to write the function:

for i in 1:length(v)
D[:New] += v[i]*(D[:column_names_of_D[i]])
end

But I am unsure how to call the D[:column_names_of_D[i]]) part…


#2

I would collect them as a matrix and then use matrix * vector multiplication, eg

using DataFrames

"""
Return columns that start with `prefix` as a matrix.
"""
function cols2matrix(df::DataFrame, prefix)
    matching_names = sort(filter(name -> startswith(String(name), String(prefix)),
                                 names(df)))
    hcat(getindex.(df, matching_names)...)
end

# make a dataframe
df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10))

M = cols2matrix(df, "v_")

M * [1, 2, 3]

But I would suggest that storing this kind of data in a matrix may be best in the first place.


#3

Just a note about the hcat command. Matrix() works just find on DataFrames and is probably more idiomatic.

I also am gonna work on a package that overloads some useful string commands, or writes wrappers for them, for common string operations that would be used on the names of dataframes columns.


#4

Excellent idea, as it allows more modular code:

using DataFrames

colnames_with_prefix(df::DataFrame, prefix) =
    sort(filter(name -> startswith(String(name), String(prefix)), names(df)))

df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10),
               a = rand(1:20, 10))

Matrix(df[colnames_with_prefix(df, :v_)]) * [1, 2, 3]

#5

Thanks Tamas! :slight_smile: