# Simulation of regression coefficients

I’m still very much a novice in Julia, but I’m already using it productively for many tasks. In my research area, R and Stata are the main alternatives, so Julia really has a number of advantages in terms of speed and syntax. (Great job everyone!)

I wrote a short blog post as a response to this interesting post on coding up efficient simulations in R, in order to show how nicely Julia handles this task. Maybe some of you are interested in this, and I’d be interested in learning whether there are any obvious improvements to my code.

2 Likes

This is the fastest I can come up with on the CPU (before I go to sleep). You might be able to squeeze more out of it on a GPU.

On my computer, my version is 4x faster than the fastest version in your blog yet.

But the idea is that you can do multi-threading. And you avoid memory allocating in `gen_data` by preallocating matrices and just re-using them.

But for multi-threading, you are better off creating multiple copies of the data so your threads can work concurrently and don’t have to share data (leading to false-sharing).

Also, you can save time by not going through with `DataFrames` and just create the matrix directly.

Also, if you use GPU you might want to turn off multi-threading.

``````function gen_data2!(X, y, x1_dmean, x2_dmean)
## total time periods in the the panel = 500
tt = 500

# x1 and x2 covariates
x1_A = 1 .+ rand(std_normal, tt)
x1_B = 1/4 .+ rand(std_normal, tt)
x2_A = 1 .+ x1_A .+ rand(std_normal, tt)
x2_B = 1 .+ x1_B .+ rand(std_normal, tt)

# outcomes (notice different slope coefs for x2_A and x2_B)
y_A = x1_A .+ 1*x2_A + rand(std_normal, tt)
y_B = x1_B .+ 2*x2_B + rand(std_normal, tt)
x1 = vcat(x1_A, x1_B)
x2 = vcat(x2_A, x2_B)

X[:, 2] .= vcat(fill(0, length(x1_A)), fill(1, length(x1_B)))
X[:, 3] .= x1
X[:, 4] .= x2
X[:, 5] .= (x1 .* x2)

y .= vcat(y_A, y_B)

x1_dmean .= vcat(x1_A .- mean(x1_A), x1_B .- mean(x1_B))
x2_dmean .= vcat(x2_A .- mean(x2_A), x2_B .- mean(x2_B))
end

function coefs_lm_formula4(X, y, x1_dmean, x2_dmean)
mod_level = fastfit(X, y)
@inbounds X[:, 5] .= x1_dmean .* x2_dmean
mod_dmean = fastfit(X, y)
(mod_level[end], mod_dmean[end])
end

function run_simulations5(nsim)
sims = zeros(nsim, 2);
X = [zeros(Float64, 1000, 5) for _ in 1:nthreads()]
y = [zeros(Float64, 1000) for _ in 1:nthreads()]
x1_dmean = [zeros(Float64, 1000) for _ in 1:nthreads()]
x2_dmean = [zeros(Float64, 1000) for _ in 1:nthreads()]

# set constant
X[i][:, 1] .= 1
end

end
sims
end
``````
2 Likes

Thanks, very interesting!
I hadn’t thought about preallocating the matrices and vectors this way, but that seems like a low hanging fruit. The main performance benefit of your version seems to come from the multithreading, though. I think that’s the part where the readability takes a relatively strong hit, but, of course, if time is of the essence, that’s an obvious way to go. A version with preallocation, but without multithreading, runs in about 1.4s on my machine, so it’s a small improvement over the last version on the blog.

I wonder if someone has made a macro for that.

also, the other issue is allocation. The code seems to be allocating a lot. So I’d look at these lines and see if allocation can be reduce by consolidating some of them

1 Like

Nice. But I would say allocation seems to be main unresolved issue with the CPU version.

Wonder if anyone will attempt a GPU version?

Yeah, would be interesting. As I wrote in the blog post, I’m not so interested in getting the fastest version possible, although it’s fun to try to improve the speed of course

1 Like

I get the following timings:

``````VERSION = v"1.6.1"
n = 20000

Original version (with DataFrame, single thread):
1.836 s (2040002 allocations: 4.65 GiB)

Optimized version (with DataFrame, single thread):
918.817 ms (160062 allocations: 5.94 MiB)

168.350 ms (160786 allocations: 7.56 MiB)

714.319 ms (760130 allocations: 2.64 GiB)

869.904 ms (160017 allocations: 5.90 MiB)

On my laptop computer, `run_simulations5` with 12 threads is 2.5 times faster than the original.