After successfully implementing a small function to generate correlated data, I wish now to simulate data for logistic regression (with a 0/1 outcome) with specific coefficients.
I’ve tried to follow two threads (originally for R; see this and this), but encountered some problems, specifically in the equivalent of the rbinom()
function.
# R version
# https://stats.stackexchange.com/questions/46523/how-to-simulate-artificial-data-for-logistic-regression
# > x1 = rnorm(1000) # some continuous variables
# > x2 = rnorm(1000)
# > z = 1 + 2*x1 + 3*x2 # linear combination with a bias
# > pr = 1/(1+exp(-z)) # pass through an inv-logit function
# > y = rbinom(1000,1,pr) # bernoulli response variable
# Julia attempt
# add https://github.com/neuropsychology/Psycho.jl.git
using Psycho, Statistics, Random, GLM
coefs = [0.2]
data = simulate_data_correlation(coefs; n=100)
cor(data[:y], data[:Var1]) # 0.2
data[:y] = 1 ./ (1 .+ exp.(-1 * data[:y])) # pass through an inv-logit function
response = rand.([Distributions.Binomial.(1, x) for x in data[:y]], 1)
data[:y] = [x[1] for x in response]
glm(@formula(y ~ Var1), data, Binomial(), LogitLink()) # Doesn't return the correct coef
I am not sure where the errors are… Any advice?
PS: I am not trying to copy the R version per se, but rather just to simulate logistic data with pre-specified coefficients. There could be a better approach for this problem, but I only found these answers for R.