Regression using Categorical variables


#1

Can anyone tell me how to define categorical variables (with more than two levels) while performing regression in Julia…?
Thanks
say my data is

my_dataframe= DataFrame(class=[3,1,3,2,1,2,1],hist=[34,36,38,40,42,44,46], math=[32,65,98,78,45,12,92], scie=[45,98,56,23,12,45,95], espan= [98,65,45,75,62,35,76], engl =[84,62,74,93,62,46,83], rank =[1,2,3,4,5,6,7])

Everything except class NOT categorical.
How would i say that class is categorical.?
Furthermore, would it fit the regression considering the qualitative factor…?

using GLM
linearmodel= fit(LinearModel, @formula(rank ~ maths+engl+science+social+civics+class), my_dataframe)

Thanks…!!
Ebby


#2

Just do:

using CategoricalArrays
DataFrame(class=categorical([3,1,3,2,1,2,1]), ...)

(Please use triple backquotes around code blocks so that they are rendered as such.)


#3

Thanks…!


#4

Thanks,
But when I perform the following,

Pkg.add(“CategoricalArrays”)
using CategoricalArrays
using DataFrames
my_dataframe=DataFrame(class=categorical([3,1,3,2,1,2,1]),civics=[34,36,38,40,42,44,46], maths=[32,65,98,78,45,12,92], science=[45,98,56,23,12,45,95], social= [98,65,45,75,62,35,76], engl =[84,62,74,93,62,46,83], rank =[1,2,3,4,5,6,7])

it says

MethodError: no method matching upgrade_vector(::CategoricalArrays.CategoricalArray{Int64,1,UInt32})
Closest candidates are:

upgrade_vector(!Matched::BitArray{1}) at C:\Users\uqethom9.julia\v0.6\DataFrames\src\dataframe\dataframe.jl:349
upgrade_vector(!Matched::Array{T,1} where T) at C:\Users\uqethom9.julia\v0.6\DataFrames\src\dataframe\dataframe.jl:347
upgrade_vector(!Matched::Range) at C:\Users\uqethom9.julia\v0.6\DataFrames\src\dataframe\dataframe.jl:348

setindex!(::DataFrames.DataFrame, ::CategoricalArrays.CategoricalArray{Int64,1,UInt32}, ::Symbol) at dataframe.jl:364
#DataFrame#19(::Array{Any,1}, ::Type{T} where T) at dataframe.jl:100
(::Core.#kw#Type)(::Array{Any,1}, ::Type{DataFrames.DataFrame}) at :0
include_string(::String, ::String) at loading.jl:522
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##49#53{String,Int64,String})() at eval.jl:50
withpath(::Atom.##49#53{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
macro expansion at eval.jl:49 [inlined]
(::Atom.##48#52{Dict{String,Any}})() at task.jl:80

And I am sorry that I did not understand what you meant by this “(Please use triple backquotes around code blocks so that they are rendered as such.)”

Ebby


#5
```

code here
```
Use the backwards apostrophe “tick” three times to begin and end your block of code.


#6

You can still edit your comment.


#7

I can reproduce the error:

using CategoricalArrays, DataFrames
a = categorical([3,1,3,2,1,2,1])
DataFrame(a = a, b = 1:7)

gives

ERROR: MethodError: no method matching upgrade_vector(::CategoricalArrays.CategoricalArray{Int64,1,UInt32})
Closest candidates are:
  upgrade_vector(::BitArray{1}) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:349
  upgrade_vector(::Array{T,1} where T) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:347
  upgrade_vector(::Range) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:348
  ...
Stacktrace:
 [1] setindex!(::DataFrames.DataFrame, ::CategoricalArrays.CategoricalArray{Int64,1,UInt32}, ::Symbol) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:364
 [2] #DataFrame#19(::Array{Any,1}, ::Type{T} where T) at /Users/michael/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:100
 [3] (::Core.#kw#Type)(::Array{Any,1}, ::Type{DataFrames.DataFrame}) at ./<missing>:100

Version congruency issue?

Pkg.status.(["DataFrames", "CategoricalArrays"])
 - DataFrames                    0.10.1
 - CategoricalArrays             0.1.6

#8

Thanks for trying them. But

Pkg.status.(["DataFrames","CategoricalArrays","GLM"])

  • DataFrames 0.10.1
  • CategoricalArrays 0.1.6
  • GLM 0.8.1

And GLM and DataFrames work perfectly together.!


#10

I don’t understand, I just said that I can indeed reproduce.


#11

Yes, you need the new DataFrames (v0.11).