Regression using Categorical variables

Can anyone tell me how to define categorical variables (with more than two levels) while performing regression in Julia…?
Thanks
say my data is

my_dataframe= DataFrame(class=[3,1,3,2,1,2,1],hist=[34,36,38,40,42,44,46], math=[32,65,98,78,45,12,92], scie=[45,98,56,23,12,45,95], espan= [98,65,45,75,62,35,76], engl =[84,62,74,93,62,46,83], rank =[1,2,3,4,5,6,7])

Everything except class NOT categorical.
How would i say that class is categorical.?
Furthermore, would it fit the regression considering the qualitative factor…?

using GLM
linearmodel= fit(LinearModel, @formula(rank ~ maths+engl+science+social+civics+class), my_dataframe)

Thanks…!!
Ebby

Just do:

using CategoricalArrays
DataFrame(class=categorical([3,1,3,2,1,2,1]), ...)

(Please use triple backquotes around code blocks so that they are rendered as such.)

1 Like

Thanks…!

Thanks,
But when I perform the following,

Pkg.add(“CategoricalArrays”)
using CategoricalArrays
using DataFrames
my_dataframe=DataFrame(class=categorical([3,1,3,2,1,2,1]),civics=[34,36,38,40,42,44,46], maths=[32,65,98,78,45,12,92], science=[45,98,56,23,12,45,95], social= [98,65,45,75,62,35,76], engl =[84,62,74,93,62,46,83], rank =[1,2,3,4,5,6,7])

it says

MethodError: no method matching upgrade_vector(::CategoricalArrays.CategoricalArray{Int64,1,UInt32})
Closest candidates are:

upgrade_vector(!Matched::BitArray{1}) at C:\Users\uqethom9.julia\v0.6\DataFrames\src\dataframe\dataframe.jl:349
upgrade_vector(!Matched::Array{T,1} where T) at C:\Users\uqethom9.julia\v0.6\DataFrames\src\dataframe\dataframe.jl:347
upgrade_vector(!Matched::Range) at C:\Users\uqethom9.julia\v0.6\DataFrames\src\dataframe\dataframe.jl:348

setindex!(::DataFrames.DataFrame, ::CategoricalArrays.CategoricalArray{Int64,1,UInt32}, ::Symbol) at dataframe.jl:364
#DataFrame#19(::Array{Any,1}, ::Type{T} where T) at dataframe.jl:100
(::Core.#kw#Type)(::Array{Any,1}, ::Type{DataFrames.DataFrame}) at :0
include_string(::String, ::String) at loading.jl:522
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##49#53{String,Int64,String})() at eval.jl:50
withpath(::Atom.##49#53{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
macro expansion at eval.jl:49 [inlined]
(::Atom.##48#52{Dict{String,Any}})() at task.jl:80

And I am sorry that I did not understand what you meant by this “(Please use triple backquotes around code blocks so that they are rendered as such.)”

Ebby

```

code here
```
Use the backwards apostrophe “tick” three times to begin and end your block of code.

1 Like

You can still edit your comment.

1 Like

I can reproduce the error:

using CategoricalArrays, DataFrames
a = categorical([3,1,3,2,1,2,1])
DataFrame(a = a, b = 1:7)

gives

ERROR: MethodError: no method matching upgrade_vector(::CategoricalArrays.CategoricalArray{Int64,1,UInt32})
Closest candidates are:
  upgrade_vector(::BitArray{1}) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:349
  upgrade_vector(::Array{T,1} where T) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:347
  upgrade_vector(::Range) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:348
  ...
Stacktrace:
 [1] setindex!(::DataFrames.DataFrame, ::CategoricalArrays.CategoricalArray{Int64,1,UInt32}, ::Symbol) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:364
 [2] #DataFrame#19(::Array{Any,1}, ::Type{T} where T) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:100
 [3] (::Core.#kw#Type)(::Array{Any,1}, ::Type{DataFrames.DataFrame}) at ./<missing>:100

Version congruency issue?

Pkg.status.(["DataFrames", "CategoricalArrays"])
 - DataFrames                    0.10.1
 - CategoricalArrays             0.1.6

Thanks for trying them. But

Pkg.status.(["DataFrames","CategoricalArrays","GLM"])

  • DataFrames 0.10.1
  • CategoricalArrays 0.1.6
  • GLM 0.8.1

And GLM and DataFrames work perfectly together.!

I don’t understand, I just said that I can indeed reproduce.

Yes, you need the new DataFrames (v0.11).

Sorry to open this again but Iam not able to use a simple logistic regression with glm:

a = categorical([3,1,3,2,1,2,1])
df = DataFrame(a = a, b = 1:7)
glm(@formula(a ~ b), df, Binomial(), LogitLink())

gives:

MethodError: no method matching zero(::Type{CategoricalValue{Int64,UInt32}})
Closest candidates are:
  zero(!Matched::Type{LibGit2.GitHash}) at /build/julia/src/julia-1.1.0/usr/share/julia/stdlib/v1.1/LibGit2/src/oid.jl:220
  zero(!Matched::Type{Pkg.Resolve.VersionWeights.VersionWeight}) at /build/julia/src/julia-1.1.0/usr/share/julia/stdlib/v1.1/Pkg/src/resolve/VersionWeights.jl:19
  zero(!Matched::Type{Pkg.Resolve.MaxSum.FieldValues.FieldValue}) at /build/julia/src/julia-1.1.0/usr/share/julia/stdlib/v1.1/Pkg/src/resolve/FieldValues.jl:44
  ...

Stacktrace:
 [1] float(::Array{CategoricalValue{Int64,UInt32},1}) at ./float.jl:886
 [2] #fit#9(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Type{GeneralizedLinearModel}, ::Array{Float64,2}, ::Array{CategoricalValue{Int64,UInt32},1}, ::Binomial{Float64}, ::LogitLink) at /home/alexander/.julia/packages/GLM/YjBzj/src/glmfit.jl:323
 [3] fit(::Type{GeneralizedLinearModel}, ::Array{Float64,2}, ::Array{CategoricalValue{Int64,UInt32},1}, ::Binomial{Float64}, ::LogitLink) at /home/alexander/.julia/packages/GLM/YjBzj/src/glmfit.jl:323
 [4] #fit#36(::Dict{Any,Any}, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Type{GeneralizedLinearModel}, ::Formula, ::DataFrame, ::Binomial{Float64}, ::Vararg{Any,N} where N) at /home/alexander/.julia/packages/StatsModels/pBxdt/src/statsmodel.jl:72
 [5] fit(::Type{GeneralizedLinearModel}, ::Formula, ::DataFrame, ::Binomial{Float64}, ::LogitLink) at /home/alexander/.julia/packages/StatsModels/pBxdt/src/statsmodel.jl:66
 [6] #glm#10(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Formula, ::DataFrame, ::Binomial{Float64}, ::Vararg{Any,N} where N) at /home/alexander/.julia/packages/GLM/YjBzj/src/glmfit.jl:326
 [7] glm(::Formula, ::DataFrame, ::Binomial{Float64}, ::Vararg{Any,N} where N) at /home/alexander/.julia/packages/GLM/YjBzj/src/glmfit.jl:326
 [8] top-level scope at In[209]:3

It looks like GLM does not support categorical variables on the left hand side. Also I don’t think GLM does multinomial logistic regression.

May be flux can do what you want.

1 Like

You have more than 2 categories in a but you’re specifying a binomial distribution. That cannot be done. Unless the a is meant to be counts? If it’s categories you want you need to specify a multinomial distribution or a categorical.

Hi,
thanks for the reply. I tried it with a binary classification problem and it gives the same error. It seems that GLM can’t use Categorical Values in the target. This was a bad example from me. From a Mathematical point of view youre absolutly right. Logistic Regression cant be applied on multiclassification problems.

What if you want to change certain variables from an existing dataframe. ie. “consume”, in df, which is an imported .xlsx

Can you, as usual, attempt to create an MWE to show what you’re trying to do, what you expect to happen, and what is going wrong?

3 Likes

I assume you want to change from numeric to categorical:

categorical!(df, :consume)