Multiplying columns in DataFrames

How can I multiply values in one column by other columns containing numerical data? Suppose a dataframe df_1 is given by

df_1 = DataFrame(NAME = ["Aa","Bb","AB","CD","EF","GH"], SCORE=[0,1,10,20,30,45], VALUE =[400,200,400,600,10,20],
INCOME = [123,56,90,1000,50,60], GAIN =[55,65,111,65,34,100])

I would like to produce equivalent to df_2 that multiplies values in the column SCORE with the remaining columns

df_2 =DataFrame(NAME = ["Aa","Bb","AB","CD","EF","GH"], SCORE=[0,1,10,20,30,45],VALUE=df_1.SCORE.*df_1.VALUE,
INCOME = df_1.SCORE.*df_1.INCOME, GAIN =df_1.SCORE.*df_1.GAIN)

I have tried using this, but it doesn’t work.

df_2 = names(df_1,Not(Between(:NAME,:SCORE))  .=>*:SCORE)

Your last attempt is quite confusing. Did you mean to have a transform call in there? Ideally it would be obvious why that didn’t work, but if you explain your logic for how you arrived at that solution we could better point you in the right direction.

That said, this one is very tricky! Here is a solution

julia> nms = names(df_1, Between(:VALUE, :GAIN))
3-element Array{String,1}:
 "VALUE"
 "INCOME"
 "GAIN"

julia> transform(df_1, vcat.("SCORE", nms) .=> ByRow(*) .=> nms)
6Γ—5 DataFrame
 Row β”‚ NAME    SCORE  VALUE  INCOME  GAIN  
     β”‚ String  Int64  Int64  Int64   Int64 
─────┼─────────────────────────────────────
   1 β”‚ Aa          0      0       0      0
   2 β”‚ Bb          1    200      56     65
   3 β”‚ AB         10   4000     900   1110
   4 β”‚ CD         20  12000   20000   1300
   5 β”‚ EF         30    300    1500   1020
   6 β”‚ GH         45    900    2700   4500
1 Like
df_2 =DataFrame(NAME = ["Aa","Bb","AB","CD","EF","GH"], SCORE=[0,1,10,20,30,45]) 

[df_2[!,c] = df_1.SCORE .* df_1[! ,c] for c in names(df_1)[3:end] ]
df_2
1 Like

I wanted to produce exactly what you have done in your reply. My knowledge of dataframes is not good, so my attempt at it may not make any sense.

I converted the dataframe to a matrix for this operation, but then I was not sure how to change column names.

temp=Matrix{Int64}(select(df_1, Not(Between(:NAME,:SCORE))))
temp2=temp.*df_1.SCORE
df_t= DataFrame(temp2)

Thanks

you can use rename!(df, [:A, :B]) to change names.

Thank you

cols=eachcol(df_1)
map(x-> x.*=cols.SCORE, cols[3:5])
df_1=DataFrame(cols)

or

transform(df_1, nms.=> (x-> x.*=cols.SCORE).=>nms)

where nms is as defined by @pdeffebach

I continue to use this problem as a testing ground for the various forms expected by the transform function syntax.
I’m trying to understand what is passed in the various possible ways to select the columns on which to apply the transformation and consequently how it can then be β€œworked”.

These are some of the tests made (after several failed attempts), on which maybe someone could add some explicit comments on why things go this way.

Why in the following case doesn’t the last AsTable produce new columns with the names x1, x2, …?

transform(df_1,AsTable(Between(:VALUE, :GAIN))=> (x-> DataFrame(x).*df_1.SCORE)=>AsTable)
6Γ—5 DataFrame
 Row β”‚ NAME    SCORE  VALUE  INCOME  GAIN  
     β”‚ String  Int64  Int64  Int64   Int64 
─────┼─────────────────────────────────────
   1 β”‚ Aa          0      0       0      0
   2 β”‚ Bb          1    200      56     65
   3 β”‚ AB         10   4000     900   1110
   4 β”‚ CD         20  12000   20000   1300
   5 β”‚ EF         30    300    1500   1020
   6 β”‚ GH         45    900    2700   4500

Instead in this case to get the desired names of the output columns you have to use a list of names,

transform(df_1,AsTable(Between(:VALUE, :GAIN))=> (x-> hcat(values(x)...).*df_1.SCORE)=>nms)
6Γ—5 DataFrame
 Row β”‚ NAME    SCORE  VALUE  INCOME  GAIN  
     β”‚ String  Int64  Int64  Int64   Int64 
─────┼─────────────────────────────────────
   1 β”‚ Aa          0      0       0      0
   2 β”‚ Bb          1    200      56     65
   3 β”‚ AB         10   4000     900   1110
   4 β”‚ CD         20  12000   20000   1300
   5 β”‚ EF         30    300    1500   1020
   6 β”‚ GH         45    900    2700   4500

Otherwise …

 transform(df_1,AsTable(Between(:VALUE, :GAIN))=> (x-> hcat(values(x)...).*df_1.SCORE)=>AsTable)
6Γ—8 DataFrame
 Row β”‚ NAME    SCORE  VALUE  INCOME  GAIN   x1     x2     x3    
     β”‚ String  Int64  Int64  Int64   Int64  Int64  Int64  Int64 
─────┼──────────────────────────────────────────────────────────
   1 β”‚ Aa          0    400     123     55      0      0      0
   2 β”‚ Bb          1    200      56     65    200     56     65
   3 β”‚ AB         10    400      90    111   4000    900   1110
   4 β”‚ CD         20    600    1000     65  12000  20000   1300
   5 β”‚ EF         30     10      50     34    300   1500   1020
   6 β”‚ GH         45     20      60    100    900   2700   4500

I finally learned that using AsTable as a selector wrapper a namedtuple is passed.
In the first case I tried the DataFrame function on the namedtuple to be able to do. * Df_1.SCORE and I saw that it works.

In the second case, I had to use β€œβ€¦β€ and the hcat function. I had tried to do Matrix (values (x)) but it didn’t work. Is there a more direct way to have a table / matrix on which to apply brodacast .* Df_1.SCORE?
I know this is not the best way to solve the problem, but I just want to understand the various possibilities.

Why in the following case doesn’t the last AsTable produce new columns with the names x1, x2, …?

Because a DataFrame has names associated with it, so the new columns take on those names.

I finally learned that using AsTable as a selector wrapper a namedtuple is passed.

This shouldn’t be a surprise, it’s in the documentation!

help?> AsTable
search: AsTable

  AsTable(cols)

  A type used for selection operations to signal that the columns selected by
  the wrapped selector should be passed as a NamedTuple to the function.

transform(df_1,AsTable(Between(:VALUE, :GAIN))=> (x-> hcat(values(x)…).*df_1.SCORE)=>nms)

The function here returns a matrix, which has no names associated with it, so you get x1, x2, ...

You can use Tables.matrix((a = [1, 2, 3], b = [4, 5, 6]))

1 Like

Yes of course. And in fact I had already read it.
But seeing it in action perhaps produces a deeper understanding :grinning:

transform(df_1,AsTable(Between(:VALUE, :GAIN))=> (x-> x*df_1.SCORE)=>AsTable)
ERROR: MethodError: no method matching *(::NamedTuple{(:VALUE, :INCOME, :GAIN),Tuple{Array{Int64,1},Array{Int64,1},Array{Int64,1}}}, ::Array{Int64,1})

Received! Thanks