Hello everyone,
New Julia user here.
Suppose I have a vector of strings, and I’d like to generate a dummy variable for each unique value that appears in that string.
For instance:
data = ["a", "b", "a", "a", "b"]
I want to create a matrix of dummy variables along the following lines:
mat_col1 mat_col2
1 0
0 1
1 0
1 0
0 1
where mat_col1 is a dummy for the level “a” and mat_col2 for level “b”.
I was wondering how this could be done. I’ve messed around with StatsModels.ContrastMatrix but that creates a dummy for every observation x level, not level.
For instance,
using StatsModels
StatsModels.ContrastsMatrix(StatsModels.DummyCoding(), ["a", "b", "a", "a", "b"]).matrix
gives
0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0
So every column here creates a row-specific dummy variable by level (row-1"a" is a different dummy variable from row-3"a"). This therefore doesn’t give me what I need.
I’m also looking for a full dummy coding, i.e. dv for both a and b, rather than for just b. StatsModels.ContrastMatrix seems to choose a base level and leave that out, although FullDummyCoding seems to get around that.
The example I’ve given is stylized; my actual problem has many observations and levels, so manually creating a dv for each unique value is tough.