Here’s the onehot-encoder function I wrote. It’s now failing by saying the return value “onehot” is not defined.
function onehotenc(df::DataFrame)
# Loop to develop onehot-encoding
for col in eachcol(df)
# Can we determine if the column is a String and only encode it if it is?
if col isa String
# How long is the current column we're going to onehot-encode?
len = length(col)
# Save the unique values (or Set) and how many there are for our initial zeros matrix
vals = unique(col)
# Set up the Dict key-value pairs to save the Boolean operation results
# For each val in vals make that val a key (:key) with
# and fill that column with Boolean falses to begin with
dict = Dict(Symbol(val) => falses(len) for val in vals)
# Once the Dict is populated convert it to a DataFrame
onehot = DataFrame(dict)
# Change the value from 0 to 1 in the zero DataFrame based on whether the unique value matches the original DF
for (i, v) in enumerate(col)
# At the row and col of the current enumerated col value, set the value to true
# EX: In the onehot DF at row 1, column :A set equal to true
# EX: In the onehot DF at row 2, column :B set equal to true
onehot[i, Symbol(v)] = true
end
end
end
return onehot
end