Replacing one column with three (DataFrames.jl)

Hi everyone! My first post here. I have to say, loving the Julia language. It is quickly becoming my favourite. Also, fantastic work with Juno. It is awesome.

I am trying to implement an “One Hot” transform on some data and would like to know if there is a way (in DataFrames.jl) to replace one column with three? Specifically by using select!() or transform!()?

The data:

image

Some code:

# Create dataframe.
currDir = pwd()
filePath = string(currDir, "\\UdemyCourseML\\(1)DataPreprocessing\\data.csv");
df = DataFrame(CSV.File.(filePath))

# Define variables.
featX = df[:, 1:end-1]
featY = df.Purchased;

# Replace missing values with averages.
featX.Salary = coalesce.(featX.Salary, mean(skipmissing(featX.Salary)))
featX.Age = coalesce.(featX.Age, mean(skipmissing(featX.Age)))

# Encode Country data with one hot encoding.
onehotCountries = transpose(Flux.onehotbatch(featX.Country, unique(featX.Country)))

#select!()         ?
#transform!()      ?

Thanks!

You want transform with AsTable. Here is an MWE that doesn’t use Flux.jl, but should be basically the same

julia> function onehot(x)
       u = unique(x)
       df = DataFrame()
       for ui in u
           df[!, ui] = x .== ui
       end
       df
       end;

julia> df = DataFrame(s = rand(["up", "down", "left"], 10))
10×1 DataFrame
 Row │ s
     │ String
─────┼────────
   1 │ left
   2 │ left
   3 │ down
   4 │ left
   5 │ down
   6 │ up
   7 │ down
   8 │ up
   9 │ up
  10 │ left

julia> transform!(df, :s => onehot => AsTable)
10×4 DataFrame
 Row │ s       left   down   up
     │ String  Bool   Bool   Bool
─────┼─────────────────────────────
   1 │ left     true  false  false
   2 │ left     true  false  false
   3 │ down    false   true  false
   4 │ left     true  false  false
   5 │ down    false   true  false
   6 │ up      false  false   true
   7 │ down    false   true  false
   8 │ up      false  false   true
   9 │ up      false  false   true
  10 │ left     true  false  false
2 Likes

I have listed some ways of doing exactly this All the ways to do one-hot encoding

FYI. Juno is more or less not being developed for new features. VSCode is where the IDE dev works are concentrated.

1 Like

I ended up using DataConvenience.jl, thank you!

# Encode country data with onehot encoding.
onehot!(featX, :Country, outnames=Symbol.(unique(featX.Country)))
select!(featX, (2:6))

1 Like