This may not be what you had in mind, but on my machine using all cores specified with the command line option -t numberofphysicalcores the code below computes 10 million OLS regressions with on average 5000 observations in about 50 seconds. There undoubtedly are further and more elegant optimizations to be had.
The main things I changed are:
- making p a constant
- not generating many dataframes
- simplifying the computation of the ols estimator
- not reallocating data many times
- some simple parallelization
(As an aside, for plotting, you really don’t need to compute the numbers for every sample size)
Hth
using Distributions
const p=1/365 # this will produce lots of Infs or NaNs
function gen_data!(x, y, α, β)
for i ∈ eachindex( x )
x[i]= rand(Binomial(1,p))
y[i]= α + β*x[i] + randn()
end
end
ols(y,x) = cov(x,y) / var(x)
function simulation( x, y, α, β)
gen_data!( x,y, α, β)
return ols(y,x)
end
function simulations(m,n, α, β)
mn = vr = 0.0
x = zeros( Int, n )
y = zeros( n )
for i in 1:m
β̂ = simulation(x,y,α,β) #populates a vector with β̂, at position i, β̂ of simulatio i
mn += β̂
vr += β̂ * β̂
end
mn /= m
vr = vr / m - mn^2
mn, vr
end
function convergence(a,b)
expected_value=zeros(b-a+1)
expected_var=zeros(b-a+1)
@Threads.threads :dynamic for n in a:b
expected_value[n-a+1], expected_var[n-a+1] = simulations(1000,n,0,1)
end
[expected_value, expected_var]
end
@time result=convergence(300, 10000)
# purists would like to run this twice to remove compilation overhead from comparisons, but second order here