I’m trying to write some Kaggle Titanic code and to show context first:
datadirectory = "../data/titanic/"
train_df = CSV.read(datadirectory * "train.csv")
describe(train_df)
variable mean min median max nunique nmissing eltype
Symbol Union… Any Union… Any Union… Union… Type
1 PassengerId 446.0 1 446.0 891 Int64
2 Survived 0.383838 0 0.0 1 Int64
3 Pclass 2.30864 1 3.0 3 Int64
4 Name Abbing, Mr. Anthony van Melkebeke, Mr. Philemon 891 String
5 Sex female male 2 String
6 Age 29.6991 0.42 28.0 80.0 177 Union{Missing, Float64}
7 SibSp 0.523008 0 0.0 8 Int64
8 Parch 0.381594 0 0.0 6 Int64
9 Ticket 110152 WE/P 5735 681 String
10 Fare 32.2042 0.0 14.4542 512.329 Float64
11 Cabin A10 T 147 687 Union{Missing, String}
12 Embarked C S 3 2 Union{Missing, String}
Now I’m trying to do some transformations to this, but I’m not really sure how I should be writing this, and I get slightly odd results. I’m not at all certain that this code is good Julia, probably it isn’t.
train_df = @pipe train_df |>
@select(_, :PassengerId, :Survived, :Pclass, :Sex, :Age) |>
@transform(_, Survived = convert.(Bool, :Survived)) |>
categorical(_, :Pclass) |>
categorical(_, :Sex)
describe(train_df)
variable mean min median max nunique nmissing eltype
Symbol Union… Any Union… Any Union… Union… Type
1 PassengerId 446.0 1 446.0 891 Int64
2 Survived 0.383838 0 0.0 1 Bool
3 Pclass 1 3 3 CategoricalValue{Int64,UInt32}
4 Sex female male 2 CategoricalValue{String,UInt32}
5 Age 29.6991 0.42 28.0 80.0 177 Union{Missing, Float64}
Why did I get CategoricalValue{Int64,UInt32} and CategoricalValue{String,UInt32}? The Sex column should have just female or male, and the Pclass just 1,2, or 3.
What would be good Julia style to write that piece of code?