Efficient generation of random number from 2 columns in a dataframe

Hi,

If I have a column with the mean and another one with the standard deviation - how to create a new column with this function:

rng(m,s) = rand(Normal(m, s), 1)

The fastest option is preferred …

Thanks!

using DataFramesMeta, you can do

julia> using DataFramesMeta, Distributions

julia> df = DataFrame(m = [1, 2, 2, 4], s = [.5, .6, .2, .1])
4Γ—2 DataFrame
 Row β”‚ m      s       
     β”‚ Int64  Float64 
─────┼────────────────
   1 β”‚     1      0.5
   2 β”‚     2      0.6
   3 β”‚     2      0.2
   4 β”‚     4      0.1

julia> @rtransform df :r = rand(Normal(:m, :s))
4Γ—3 DataFrame
 Row β”‚ m      s        r       
     β”‚ Int64  Float64  Float64 
─────┼─────────────────────────
   1 β”‚     1      0.5  1.45605
   2 β”‚     2      0.6  1.8666
   3 β”‚     2      0.2  2.0909
   4 β”‚     4      0.1  4.09518
1 Like

Without macros,

julia> df = DataFrame(m = [1, 2, 2, 4], s = [.5, .6, .2, .1])
4Γ—2 DataFrame
 Row β”‚ m      s       
     β”‚ Int64  Float64 
─────┼────────────────
   1 β”‚     1      0.5
   2 β”‚     2      0.6
   3 β”‚     2      0.2
   4 β”‚     4      0.1

julia> transform(df, [:m,:s] => ByRow((m,s)->rand(Normal(m,s))) => :r)
4Γ—3 DataFrame
 Row β”‚ m      s        r        
     β”‚ Int64  Float64  Float64  
─────┼──────────────────────────
   1 β”‚     1      0.5  0.982336
   2 β”‚     2      0.6  2.38128
   3 β”‚     2      0.2  2.29904
   4 β”‚     4      0.1  3.989

Or like this:

df.r = rng.(df.m, df.s)

or if you haven’t already defined rng, you can do

df.r = rand.(Normal.(df.m, df.s))
3 Likes

Thanks all!