Simulate data for logistic regression

After successfully implementing a small function to generate correlated data, I wish now to simulate data for logistic regression (with a 0/1 outcome) with specific coefficients.

I’ve tried to follow two threads (originally for R; see this and this), but encountered some problems, specifically in the equivalent of the rbinom() function.

# R version
# https://stats.stackexchange.com/questions/46523/how-to-simulate-artificial-data-for-logistic-regression
# > x1 = rnorm(1000)           # some continuous variables
# > x2 = rnorm(1000)
# > z = 1 + 2*x1 + 3*x2        # linear combination with a bias
# > pr = 1/(1+exp(-z))         # pass through an inv-logit function
# > y = rbinom(1000,1,pr)      # bernoulli response variable


# Julia attempt
# add https://github.com/neuropsychology/Psycho.jl.git

using Psycho, Statistics, Random, GLM

coefs = [0.2]
data = simulate_data_correlation(coefs; n=100)

cor(data[:y], data[:Var1]) # 0.2


data[:y] = 1 ./ (1 .+ exp.(-1 * data[:y]))  # pass through an inv-logit function
response = rand.([Distributions.Binomial.(1, x) for x in data[:y]], 1)
data[:y] = [x[1] for x in response]

glm(@formula(y ~ Var1), data, Binomial(), LogitLink())  # Doesn't return the correct coef

I am not sure where the errors are… Any advice?

PS: I am not trying to copy the R version per se, but rather just to simulate logistic data with pre-specified coefficients. There could be a better approach for this problem, but I only found these answers for R.

I think you miscoded the logistic function. But in any case, I think you are overcomplicating this, this is how I would code the R snippet in Julia:

using StatsFuns: logistic
X = randn(1000, 2)
Z = X * [2, 3] .+ 1
p = logistic.(Z)
y = Float64.(rand(length(p)) .< p)
4 Likes

:open_mouth: impressive! thanks!

See also https://github.com/mcreel/Econometrics/blob/master/Examples/MLE/EstimateLogit.jl
Econometrics/LogitDGP.jl at main · mcreel/Econometrics · GitHub

1 Like