I have a very large dataframe (1 million observations) that I need to do calculation for two probabilities (p1 and p2): log(p1/(1-p2)).
And my code is:
I think my code is so inefficient that it will run out of memory. So I am wondering if there is any way to get rid of the dot operator?
Thanks a lot!
If you p1 can be zero, or p2 can be 1, and you don’t want +Inf and -Inf in the results, one trick is to add 0.000001 to the top and bottom of the ration. Then the max and min values will be +13.8 and -13.8.
One million rows will be no problem for this calculation. Begin to worry between 100 - 300 million rows.