question about creating new columns in data frame from existing columns,

sarah-ji · June 20, 2018, 5:56am

Hi, I’m a little confused on what to do in this coding situation and would appreciate some help!

I’m trying to create new columns in a dataframe (D) as a linear combination of the existing columns in that dataframe, given a vector of coefficients (v).

Basically: If I have a vector of Float64 numbers, v_i, and a 20 column data frame, D, of float64 values, how can I concatenate columns to the data frame D, where I am doing an element wise multiplication:
D[:New] = v_1D_1 + v_2D_2 + v_3D_3 + … v_20D_20 ; where D_i = ith column of D

So far I have started to write the function:

for i in 1:length(v)
D[:New] += v[i]*(D[:column_names_of_D[i]])
end

But I am unsure how to call the D[:column_names_of_D[i]]) part…

Tamas_Papp · June 20, 2018, 6:51am

I would collect them as a matrix and then use matrix * vector multiplication, eg

using DataFrames

"""
Return columns that start with `prefix` as a matrix.
"""
function cols2matrix(df::DataFrame, prefix)
    matching_names = sort(filter(name -> startswith(String(name), String(prefix)),
                                 names(df)))
    hcat(getindex.(df, matching_names)...)
end

# make a dataframe
df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10))

M = cols2matrix(df, "v_")

M * [1, 2, 3]

But I would suggest that storing this kind of data in a matrix may be best in the first place.

pdeffebach · June 20, 2018, 11:43am

Just a note about the hcat command. Matrix() works just find on DataFrames and is probably more idiomatic.

I also am gonna work on a package that overloads some useful string commands, or writes wrappers for them, for common string operations that would be used on the names of dataframes columns.

Tamas_Papp · June 20, 2018, 11:49am

Excellent idea, as it allows more modular code:

using DataFrames

colnames_with_prefix(df::DataFrame, prefix) =
    sort(filter(name -> startswith(String(name), String(prefix)), names(df)))

df = DataFrame(v_1 = randn(10), v_2 = randn(10), v_3 = randn(10),
               a = rand(1:20, 10))

Matrix(df[colnames_with_prefix(df, :v_)]) * [1, 2, 3]

sarah-ji · June 26, 2018, 4:16am

Thanks Tamas!

Topic		Replies	Views
Concatenate DataFrame columns dynamically General Usage dataframes	9	4364	September 24, 2019
How to add multiple columns to a dataframe at once General Usage	2	1283	November 29, 2022
Extra several columns from a dataframe to create a new dataframe in julia General Usage	5	571	April 12, 2019
Combine(Merge) Columns New to Julia dataframes	2	2559	February 28, 2021
Programmatically adding multiple colums to a dataframe General Usage dataframes , dataframesmeta	4	187	March 22, 2024

question about creating new columns in data frame from existing columns,

Related topics