I have a very large dataframe (1 million observations) that I need to do calculation for two probabilities (p1 and p2): log(p1/(1-p2)).
And my code is:
log.(p1)/ (-p2.+1)
I think my code is so inefficient that it will run out of memory. So I am wondering if there is any way to get rid of the dot operator?
Thanks a lot!
1 million observations shouldn’t be large for a reasonably new laptop. It seems that you are missing a dot and your parens are off, shouldn’t it be log.(p1./ (1 .- p2)) ?
If you p1 can be zero, or p2 can be 1, and you don’t want +Inf and -Inf in the results, one trick is to add 0.000001 to the top and bottom of the ration. Then the max and min values will be +13.8 and -13.8.
One million rows will be no problem for this calculation. Begin to worry between 100 - 300 million rows.