Hi, I’m a little confused on what to do in this coding situation and would appreciate some help!

I’m trying to create new columns in a dataframe (D) as a linear combination of the existing columns in that dataframe, given a vector of coefficients (v).

Basically: If I have a vector of Float64 numbers, v_i, and a 20 column data frame, D, of float64 values, how can I concatenate columns to the data frame D, where I am doing an element wise multiplication:

D[:New] = v_1*D_1 + v_2*D_2 + v_3*D_3 + … v_20*D_20 ; where D_i = ith column of D

So far I have started to write the function:

for i in 1:length(v)

D[:New] += v[i]*(D[:column_names_of_D[i]])

end

But I am unsure how to call the D[:column_names_of_D[i]]) part…

I would collect them as a matrix and then use `matrix * vector`

multiplication, eg

```
using DataFrames
"""
Return columns that start with `prefix` as a matrix.
"""
function cols2matrix(df::DataFrame, prefix)
matching_names = sort(filter(name -> startswith(String(name), String(prefix)),
names(df)))
hcat(getindex.(df, matching_names)...)
end
# make a dataframe
df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10))
M = cols2matrix(df, "v_")
M * [1, 2, 3]
```

But I would suggest that storing this kind of data in a matrix may be best in the first place.

Just a note about the `hcat`

command. `Matrix()`

works just find on DataFrames and is probably more idiomatic.

I also am gonna work on a package that overloads some useful string commands, or writes wrappers for them, for common string operations that would be used on the names of dataframes columns.

1 Like

Excellent idea, as it allows more modular code:

```
using DataFrames
colnames_with_prefix(df::DataFrame, prefix) =
sort(filter(name -> startswith(String(name), String(prefix)), names(df)))
df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10),
a = rand(1:20, 10))
Matrix(df[colnames_with_prefix(df, :v_)]) * [1, 2, 3]
```