All the ways to do one-hot encoding

julia> using StatsBase

julia> x = ["a", "b", "c", "a", "b", "d"]
6-element Vector{String}:
 "a"
 "b"
 "c"
 "a"
 "b"
 "d"

julia> indicatormat(x)
4Γ—6 Matrix{Bool}:
 1  0  0  1  0  0
 0  1  0  0  1  0
 0  0  1  0  0  0
 0  0  0  0  0  1

but normally I do your option 1 :smiley: :

julia> df = DataFrame(x=x)
6Γ—1 DataFrame
 Row β”‚ x
     β”‚ String
─────┼────────
   1 β”‚ a
   2 β”‚ b
   3 β”‚ c
   4 β”‚ a
   5 β”‚ b
   6 β”‚ d

julia> select(df, [:x => ByRow(isequal(v))=> Symbol(v) for v in unique(df.x)])
6Γ—4 DataFrame
 Row β”‚ a      b      c      d
     β”‚ Bool   Bool   Bool   Bool
─────┼────────────────────────────
   1 β”‚  true  false  false  false
   2 β”‚ false   true  false  false
   3 β”‚ false  false   true  false
   4 β”‚  true  false  false  false
   5 β”‚ false   true  false  false
   6 β”‚ false  false  false   true

(DataConvenience.jl is nice :+1:)

8 Likes