Efficient division for two probilities

X_Chen · March 24, 2023, 9:34pm

Hi everyone,

I have a very large dataframe (1 million observations) that I need to do calculation for two probabilities (p1 and p2): log(p1/(1-p2)).
And my code is:
log.(p1)/ (-p2.+1)
I think my code is so inefficient that it will run out of memory. So I am wondering if there is any way to get rid of the dot operator?
Thanks a lot!

nilshg · March 24, 2023, 9:50pm

1 million observations shouldn’t be large for a reasonably new laptop. It seems that you are missing a dot and your parens are off, shouldn’t it be log.(p1./ (1 .- p2)) ?

blackeneth · March 24, 2023, 10:27pm

If you p1 can be zero, or p2 can be 1, and you don’t want +Inf and -Inf in the results, one trick is to add 0.000001 to the top and bottom of the ration. Then the max and min values will be +13.8 and -13.8.

One million rows will be no problem for this calculation. Begin to worry between 100 - 300 million rows.

X_Chen · March 24, 2023, 11:04pm

Thank you so much! That solves the problem.

Topic		Replies	Views
DataFrames: conditional probabilities General Usage dataframes	11	433	April 11, 2024
Division by zero runs without warning -> complicates finding bugs Internals & Design question	31	2754	May 17, 2024
Efficient dataframe operations [unclear task] Statistics matrix	2	214	March 21, 2024
Row wise operations in DataFrames Data dataframes	3	2310	September 30, 2018
Reducing allocations (again) New to Julia	10	697	March 24, 2022

Efficient division for two probilities

Related topics