Hello,

I am new to Julia so I put this in this topic instead of others that might be more relevant.

I am trying to run a standard OLS simulation. That is, I generate a data frame, run an OLS, repeat for an N number of simulations, capture the values and see how they evolve as the sample size gets big. However, my code is extremely slow. **About 100 minutes!** Can someone please help me figure out what is causing the performance issues? Alternatively, what would be the “Julia way” of running this simulation?

P.S: In `convergence(a,b)`

a small (a,b), like 1:10, is relatively fast. For my desired (a,b)= (1, 10^5), It takes almost **1h30** to generate the output vector!!!

```
using Distributions, Plots, StatsPlots, Random, GLM, DataFrames, Gadfly
p=1/365
default(fmt=:png)
# _A: GENERATING THE DATA
function gen_data(n, α, β) #Takes 3 inputs, n sample size, α and β parameters of y= α+βx
#returns a dataframe of rand variables of interest
x= rand(Binomial(1,p),n) # x_i ~ Bernoulli(p) with p = 1/365
ϵ=randn(n) # Simulate ε_i~N(0,1)
y= α .+ β*x + ϵ
DataFrame(
x=x,
ϵ=ϵ,
y=y
)
end
#OLS
function ols(data) #runs an OLS, returns β̂
β̂= lm(@formula(y ~ x), data) #OLS
coef(β̂)[2] #Returning β̂
end
# Repeating X times
function simulations(x,n, α, β)
β̂_collect=zeros(x) #this creates vector of dimension x= number of simulations
for i in 1:x
data=gen_data(n, α, β)
β̂_collect[i]= ols(data) #populates a vector with β̂, at position i, β̂ of simulatio i
end
β̂_collect
end
#Asymptotic Properties of E(β̂) and Var(β̂)
function convergence(a,b)
expected_value=[]
expected_var=[]
for n in a:b
exp=simulations(1000,n,0,1)
push!(expected_value,mean(exp))
push!(expected_var, var(exp))
end
[expected_value, expected_var]
end
result=convergence(300, 10^4)
plot(result)
```