Indicator matrix for categorical data in GLM.jl with DataFrames.jl

I am working with a large data set and want to run a logit regression on monthly data. For this I create a DataFrame and use the GLM package in Julia. My code looke something like that:

f=glm((Y ~  Age + Duration + Gender + Nationality + MonthIn), Data2000, Binomial(), LogitLink())

My question is, as I have monthly data I want to create dummy variables for the 12 months, or eleven when I want to use a constant. The MonthIn is just a column which have numbers for the month (eg 3 for march).

Now when I tried to find how this is done I just learned that in R this possibility is build in s.t. it can automatically create monthly dummies. Now one guess of mine would be to use the pooling data function build in the dataframe.jl to create an indicator matrix, but I am not sure how this or something similar would be done.

I highly appreciate any help and please feel free to ask if my question is not clear.



This topic is a test of a new service discussed here. Please, do not reply to it, but instead, use the link to StackOverflow posted at the end.

Did you ever get an answer to this? I am also having some trouble using categorical variables

It looks like the poster found their own solution and wrote it as an answer in the stackoverflow link in the original post.

Is pool!(df, [:categoricalcolumn]) still the correct solution to this? I found the same post on stackoverflow, and I’m getting UndefVarError: pool! not defined (Julia 1.1.1, DataFrames 0.19.4).

Because I’m fairly new to the language, I wasn’t able to trace the logic in GLM to figure out how—or if—a formula with a categorical predictor is expanded to an indicator matrix. My goal is to use the difference of two indicator matrices as a predictor. I wrote a function to do it, but I’d rather use an existing alternative if it’s available.