Two questions about GLM


I have noticed that I could not use a CategoricalArray as the dependent variable for a logistic regression. When I convert the binary dependent variable into a Float64, then I can estimate the model without any problem.

Question 1: Is there any way that GLM be modified such that it takes a CategoricalArray as the dependent variable?

I think this is a nice service to the users, especially for models that require binary values in the dependent variable.

Question 2: Is there an example of how to use “contrasts” in the GLM model? I have a CategoricalArray with 4 levels and want to use level “2” as the base. I have read the StatsModels documentation on DummyCoding() but could not make it work.

  1. Categorical outcome responses lead to a multinomial distribution (multinomial or ordinal depending on whether isordered assuming the LogitLink). Distributions.jl has the multinomial distribution and StatsFuns.jl has the softmax function. However, GLM has does not support multinomial or categorical models yet (except for Binomial / logit). It is currently a feature request #206.
  2. For specifying contrasts,
using DataFrames, GLM
data = DataFrame(y = rand(10), x1 = rand(1:3, 10))
categorical!(data, :x1)
glm(@formula(y ~ x1), data, Normal(), IdentityLink()) # Uses default level
glm(@formula(y ~ x1), data, Normal(), IdentityLink(), contrasts = Dict(:x1 => DummyCoding(base = 2))) # Uses 2 as the base level


Thank you so much.