GLM.jl, error on Binomial distribution, fixed number of trials


Hello: the following gives an error

data = DataFrame(Y = [0,3,1,7,1,0,7,5,1], X1 = [1,2,3,1,2,3,1,2,3], X2 = [1,1,1,2,2,2,3,3,3])
model = glm(@formula(Y ~ X1 + X2), data, Binomial(10))

Error msg:

ERROR: LoadError: DomainError:
log will only return a complex result if called with a complex argument. Try log(complex(x)).
 [1] nan_dom_err at ./math.jl:300 [inlined]
 [2] log at ./math.jl:419 [inlined]
 [3] logit(::Float64) at /home/hungngo/.julia/v0.6/StatsFuns/src/basicfuns.jl:45
 [4] macro expansion at /home/hungngo/repos/git/GLM.jl/src/glmfit.jl:241 [inlined]
 [5] macro expansion at ./simdloop.jl:73 [inlined]
 [6] initialeta!(::Distributions.Binomial{Float64}, ::GLM.LogitLink, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}) at /home/hungngo/repos/git/GLM.jl/src/glmfit.jl:239
 [7] #fit#16(::Bool, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Any,1}, ::Function, ::Type{GLM.GeneralizedLinearModel}, ::Array{Float64,2}, ::Array{Float64,1}, ::Distributions.Binomial{Float64}, ::GLM.LogitLink) at /home/hungngo/repos/git/GLM.jl/src/glmfit.jl:295
 [8] #fit#17(::Array{Any,1}, ::Function, ::Type{GLM.GeneralizedLinearModel}, ::Array{Float64,2}, ::Array{Int64,1}, ::Distributions.Binomial{Float64}, ::GLM.LogitLink) at /home/hungngo/repos/git/GLM.jl/src/glmfit.jl:311
 [9] fit(::Type{GLM.GeneralizedLinearModel}, ::Array{Float64,2}, ::Array{Int64,1}, ::Distributions.Binomial{Float64}) at /home/hungngo/repos/git/GLM.jl/src/glmfit.jl:311
 [10] #fit#44(::Dict{Any,Any}, ::Array{Any,1}, ::Function, ::Type{GLM.GeneralizedLinearModel}, ::StatsModels.Formula, ::DataFrames.DataFrame, ::Distributions.Binomial{Float64}, ::Vararg{Distributions.Binomial{Float64},N} where N) at /home/hungngo/.julia/v0.6/StatsModels/src/statsmodel.jl:72
 [11] fit(::Type{GLM.GeneralizedLinearModel}, ::StatsModels.Formula, ::DataFrames.DataFrame, ::Distributions.Binomial{Float64}) at /home/hungngo/.julia/v0.6/StatsModels/src/statsmodel.jl:66
 [12] #glm#18(::Array{Any,1}, ::Function, ::StatsModels.Formula, ::DataFrames.DataFrame, ::Distributions.Binomial{Float64}, ::Vararg{Distributions.Binomial{Float64},N} where N) at /home/hungngo/repos/git/GLM.jl/src/glmfit.jl:314
 [13] glm(::StatsModels.Formula, ::DataFrames.DataFrame, ::Distributions.Binomial{Float64}, ::Vararg{Distributions.Binomial{Float64},N} where N) at /home/hungngo/repos/git/GLM.jl/src/glmfit.jl:314
 [14] include_string(::String, ::String) at ./loading.jl:522
 [15] include_string(::Module, ::String, ::String) at /home/hungngo/.julia/v0.6/Compat/src/Compat.jl:71
 [16] (::Atom.##112#116{String,String})() at /home/hungngo/.julia/v0.6/Atom/src/eval.jl:109
 [17] withpath(::Atom.##112#116{String,String}, ::String) at /home/hungngo/.julia/v0.6/CodeTools/src/utils.jl:30
 [18] withpath(::Function, ::String) at /home/hungngo/.julia/v0.6/Atom/src/eval.jl:38
 [19] hideprompt(::Atom.##111#115{String,String}) at /home/hungngo/.julia/v0.6/Atom/src/repl.jl:67
 [20] macro expansion at /home/hungngo/.julia/v0.6/Atom/src/eval.jl:106 [inlined]
 [21] (::Atom.##110#114{Dict{String,Any}})() at ./task.jl:80

Was that expected, or did I do something stupid? Thanks!


Are you sure you want Binomial and LogitLink as the distribution/link for that model? Usually logistic regression takes on true and false. It seems the response vector might be counts which should probably use Poisson. If you want to model the rates (events given 10 trials), you can use the offset argument.


To get a feel for GLM, I wanted to model the situation when y counts the number of successes out of 10 trials, and the count is modeled with Binomial(10, theta) which is a special case of GLM, AFAIK. This model is different from Poisson, in part because the count is always bounded by 10.

This test would also be a very special case of multinomial GLM (of which the multinomial logistic regression is a special case, as multinomial logistic regression is categorical GLM). Sorry for being pedantic, I wanted to make sure our terminologies are sync’ed up


Logistic regression (Bernoulli, Binomial, Multinomial) can be framed as Poisson models by changing the data representation and will be equivalent except for the log-likelihood / deviance statistics (depending on the details the number of observations and degrees of freedom); depending on whether on uses off-set or weights, etc. It seems that in that particular case, GLM.jl is using a bad initial value. Best to open an issue there and I can take a look at it. Will probably end up being a quick PR to fix it.


OK great. Issue opened. thanks for the prompt reply