How do I create a new dataframe column by dividing individual elements in two dataframe columns?

a.frist · February 22, 2022, 8:08pm

I have the following dataframe:

[ Info: Displaying top ten rows of the pij_df dataframe...
┌ Info: 10×5 DataFrame
│  Row │ MONTH  TOPIC_I    TOPIC_J   JOINT_PROB_SUM  DOC_COUNT_SUM
│      │ Int64  String15   String15  Float64         Int64
│ ─────┼───────────────────────────────────────────────────────────
│    1 │    12  TOPIC_153  TOPIC_87     0.0380672              979
│    2 │    12  TOPIC_81   TOPIC_87     0.0182519              979
│    3 │    12  TOPIC_249  TOPIC_87     0.0161693              979
│    4 │    12  TOPIC_124  TOPIC_87     0.00660719             979
│    5 │    12  TOPIC_140  TOPIC_87     0.000891694            979
│    6 │    12  TOPIC_101  TOPIC_87     0.00134154             979
│    7 │    12  TOPIC_89   TOPIC_87     0.0784224              979
│    8 │    12  TOPIC_233  TOPIC_87     0.0195678              979
│    9 │    12  TOPIC_144  TOPIC_87     0.0150135              979
└   10 │    12  TOPIC_201  TOPIC_87     0.00740799             979
[ Info: The dimensions for the pij_df: (16812500, 5)

I am trying to generate a new column “PROB_I_J” from the columns “JOINT_PROB_SUM” and “DOC_COUNT_SUM”. Each row in “PROB_I_J” should be the value for “JOINT_PROB_SUM” divided by the “DOC_COUNT_SUM”.

I am using the following DataFramesMeta macro:

@transform!(pij_df, :PROB_I_J = :JOINT_PROB_SUM / :DOC_COUNT_SUM)

I am receiving an OutOfMemoryError() from running the transform macro. I know the dataframe is quite large, but I am suspecting there is something wrong with the transform macro. Is the code above doing what I want it to do? I’ve generally used dataframes and dataframesmeta to manipulate data but I’m wondering if this is less efficient…

bkamins · February 22, 2022, 8:26pm

do

@rtransform!(pij_df, :PROB_I_J = :JOINT_PROB_SUM / :DOC_COUNT_SUM)

or

@transform!(pij_df, :PROB_I_J = :JOINT_PROB_SUM ./ :DOC_COUNT_SUM)

a.frist · February 22, 2022, 8:48pm

This works! @bkamins can you elaborate on what my original code was doing? I’ve reviewed the docs a few times and it isn’t clear to me what happens in the original transform. Is performing the calculation by mapping the first element in column1 with every single element in column2?

nilshg · February 23, 2022, 5:52am

Without any macros:

pij_df.PROB_I_J = pij_df.JOINT_PROB_SUM ./ pij_df.DOC_COUNT

bkamins · February 23, 2022, 7:44am

Your original code is creating a matrix, e.g.:

julia> [1, 2, 3] / [4, 5, 6]
3×3 Matrix{Float64}:
 0.0519481  0.0649351  0.0779221
 0.103896   0.12987    0.155844
 0.155844   0.194805   0.233766

which cannot be stored in a column of a data frame.

Topic		Replies	Views
DataFramesMeta Arthimetic New to Julia	3	320	November 12, 2019
Transform operation using two or more columns in a DataFrame Data dataframes	6	411	February 28, 2022
How to format data for an assignment problem New to Julia	1	343	November 26, 2020
Calculate partition columns with multiply /divide General Usage	3	238	October 1, 2023
Broadcast transformed data from single row to multiple columns General Usage dataframes , dataframesmeta	13	569	December 7, 2022

How do I create a new dataframe column by dividing individual elements in two dataframe columns?

Related topics