Hello Julians:
I am encountering the issue above when
I am building a logistic regression model
using the GLM package.
I was able to catch a reply from @tim.holy
that covered some functions that could
potentially address this issue HERE
I am not sure what to consider when
attempting to apply the different
PositiveFactorizations.jl functions.
My DataFrame has the structure/content
Teams = ["Jazz", "Heat", "Hawks"]
Rank = ["1st", "2nd", "3rd"]
Outcome = ["Win", "Loss"]
#Make sure to add row parameter for EACH attribute (i.e. 50)
Season = DataFrame(Id = 1:50, Gate = rand(50:15:3000, 50),
Top3 = rand(Teams, 50),
Position = rand(Rank, 50),
Column = rand(Outcome .=="Win", 50))
I performed the _onehot function from:
begin
function _onehot(df,symb)
copy = df
for c in unique(copy[!,symb])
copy[!,Symbol(c)] = copy[!,symb] .== c
end
return(copy)
end
end
Then when I attempted perform the logistic regression build
via:
fm = @formula(Column~ Top3 + Position + Gate + Jazz + Heat + Hawks + 1st + 2nd + 3rd+
Win + Loss)
logit = glm(fm, train, Binomial(), Probit())
I am returning the error in the subject line.
YES! I understand that not using the encoded columns would
yield a result. But am curious what I needed to do to build a
logistic regression WITH the encoded columnar values.