Hi everyone! My first post here. I have to say, loving the Julia language. It is quickly becoming my favourite. Also, fantastic work with Juno. It is awesome.
I am trying to implement an “One Hot” transform on some data and would like to know if there is a way (in DataFrames.jl) to replace one column with three? Specifically by using select!() or transform!()?
The data:
Some code:
# Create dataframe.
currDir = pwd()
filePath = string(currDir, "\\UdemyCourseML\\(1)DataPreprocessing\\data.csv");
df = DataFrame(CSV.File.(filePath))
# Define variables.
featX = df[:, 1:end-1]
featY = df.Purchased;
# Replace missing values with averages.
featX.Salary = coalesce.(featX.Salary, mean(skipmissing(featX.Salary)))
featX.Age = coalesce.(featX.Age, mean(skipmissing(featX.Age)))
# Encode Country data with one hot encoding.
onehotCountries = transpose(Flux.onehotbatch(featX.Country, unique(featX.Country)))
#select!() ?
#transform!() ?
You want transform with AsTable. Here is an MWE that doesn’t use Flux.jl, but should be basically the same
julia> function onehot(x)
u = unique(x)
df = DataFrame()
for ui in u
df[!, ui] = x .== ui
end
df
end;
julia> df = DataFrame(s = rand(["up", "down", "left"], 10))
10×1 DataFrame
Row │ s
│ String
─────┼────────
1 │ left
2 │ left
3 │ down
4 │ left
5 │ down
6 │ up
7 │ down
8 │ up
9 │ up
10 │ left
julia> transform!(df, :s => onehot => AsTable)
10×4 DataFrame
Row │ s left down up
│ String Bool Bool Bool
─────┼─────────────────────────────
1 │ left true false false
2 │ left true false false
3 │ down false true false
4 │ left true false false
5 │ down false true false
6 │ up false false true
7 │ down false true false
8 │ up false false true
9 │ up false false true
10 │ left true false false