Hello,
I am new to Julia so I put this in this topic instead of others that might be more relevant.
I am trying to run a standard OLS simulation. That is, I generate a data frame, run an OLS, repeat for an N number of simulations, capture the values and see how they evolve as the sample size gets big. However, my code is extremely slow. About 100 minutes! Can someone please help me figure out what is causing the performance issues? Alternatively, what would be the “Julia way” of running this simulation?
P.S: In convergence(a,b)
a small (a,b), like 1:10, is relatively fast. For my desired (a,b)= (1, 10^5), It takes almost 1h30 to generate the output vector!!!
using Distributions, Plots, StatsPlots, Random, GLM, DataFrames, Gadfly
p=1/365
default(fmt=:png)
# _A: GENERATING THE DATA
function gen_data(n, α, β) #Takes 3 inputs, n sample size, α and β parameters of y= α+βx
#returns a dataframe of rand variables of interest
x= rand(Binomial(1,p),n) # x_i ~ Bernoulli(p) with p = 1/365
ϵ=randn(n) # Simulate ε_i~N(0,1)
y= α .+ β*x + ϵ
DataFrame(
x=x,
ϵ=ϵ,
y=y
)
end
#OLS
function ols(data) #runs an OLS, returns β̂
β̂= lm(@formula(y ~ x), data) #OLS
coef(β̂)[2] #Returning β̂
end
# Repeating X times
function simulations(x,n, α, β)
β̂_collect=zeros(x) #this creates vector of dimension x= number of simulations
for i in 1:x
data=gen_data(n, α, β)
β̂_collect[i]= ols(data) #populates a vector with β̂, at position i, β̂ of simulatio i
end
β̂_collect
end
#Asymptotic Properties of E(β̂) and Var(β̂)
function convergence(a,b)
expected_value=[]
expected_var=[]
for n in a:b
exp=simulations(1000,n,0,1)
push!(expected_value,mean(exp))
push!(expected_var, var(exp))
end
[expected_value, expected_var]
end
result=convergence(300, 10^4)
plot(result)