Discretize/Binning of Continuous Variable in Dataframe

Nelson_Chow · August 9, 2021, 2:35am

Hello There,

I am just wondering what is the best way to add a new Categorical/String column to existing dataframe based on a continuous value column?

‘using DataFrames
df = DataFrame(Types = [“SUV”, “SUV”,“SUV”, “SUV”, “sedan”,“sedan”], models=[“Q3”,“Q5”, “Kluger”, “Land Cruiser”, “Corolla”, “F40”], acceleration = [11,8, 8, 19, 5.5,3.3,])’

6×3 DataFrame
Row │ Types models acceleration
│ String String Float64
─────┼────────────────────────────────────
1 │ SUV Q3 11.0
2 │ SUV Q5 8.0
3 │ SUV Kluger 8.0
4 │ SUV Land Cruiser 19.0
5 │ sedan Corolla 5.5
6 │ sedan F40 3.3

Want:
Acceleration less than 6: “Fast”
Acceleration greater than 10 “Slow”
else “Normal”

sorry, i know this is most basic, and in Python, i would have used .apply of a custom function. I did a 20 min search on Goggle but couldn’t find anything.

Thanks!

jzr · August 9, 2021, 2:41am

CategoricalArrays.cut Using CategoricalArrays · CategoricalArrays

viraltux · August 9, 2021, 6:05am

Hi @Nelson_Chow

I would typically use map

julia> c = map(x -> x<6 ? "Fast" : (x>10 ? "Slow" : "Normal"), 
               df[:,"acceleration"])
julia> insertcols!(df,ncol(df)+1,:speed=>c)
6×4 DataFrame
 Row │ Types   models        acceleration  speed  
     │ String  String        Float64       String 
─────┼────────────────────────────────────────────
   1 │ SUV     Q3                    11.0  Slow
   2 │ SUV     Q5                     8.0  Normal
   3 │ SUV     Kluger                 8.0  Normal
   4 │ SUV     Land Cruiser          19.0  Slow
   5 │ sedan   Corolla                5.5  Fast
   6 │ sedan   F40                    3.3  Fast

Also, pre-formatted text is very helpful when sharing code

Topic		Replies	Views
Equivalent to Pandas "cut" in Julia DataFrames? New to Julia question , dataframes	4	853	August 21, 2023
Using the groupby function Data	12	2639	June 6, 2020
DataFrames range binning Data	4	1011	August 8, 2024
Best way to bin data from dataframe? New to Julia	3	932	August 6, 2019
Has the categorical! function in DataFrames.jl been relegated? Data dataframes	2	475	April 7, 2023

Discretize/Binning of Continuous Variable in Dataframe

Related topics