When using the cut
function of CategoricalArrays, it returns CategoricalArray objects which contain CategoricalValue objects like [0,1)
. However, I don’t seem to understand how to access the value that was cut, for example, if I wanted the midpoint of the bin, I would run a mean over the two numbers, but since they are essentially strings, I can’t do that. Is there some way to do this that I don’t know about?
You cannot by default, unless you encode it using a custom formatted. Maybe for your use case you want to use histogram instead?
I don’t know how Histogram would work, because I need to create the bins, and then get the mean and standard deviation of another variable in each of the bins.
If you do not know how to use histogram the simplest thing is to do the following (I show it to you by example):
julia> using DataFrames, CategoricalArrays, Statistics
julia> df = DataFrame(rand(10, 2), [:ref, :other])
10×2 DataFrame
Row │ ref other
│ Float64 Float64
─────┼─────────────────────
1 │ 0.153328 0.464341
2 │ 0.794552 0.687084
3 │ 0.860548 0.66624
4 │ 0.252829 0.199308
5 │ 0.709457 0.981467
6 │ 0.2814 0.355272
7 │ 0.819591 0.413013
8 │ 0.575109 0.169053
9 │ 0.551803 0.0971433
10 │ 0.540336 0.64679
julia> df.bin = cut(df.ref, 3);
julia> df
10×3 DataFrame
Row │ ref other bin
│ Float64 Float64 Cat…
─────┼────────────────────────────────────────────────────────
1 │ 0.153328 0.464341 Q1: [0.15332816708897712, 0.5403…
2 │ 0.794552 0.687084 Q3: [0.7094574019643611, 0.86054…
3 │ 0.860548 0.66624 Q3: [0.7094574019643611, 0.86054…
4 │ 0.252829 0.199308 Q1: [0.15332816708897712, 0.5403…
5 │ 0.709457 0.981467 Q3: [0.7094574019643611, 0.86054…
6 │ 0.2814 0.355272 Q1: [0.15332816708897712, 0.5403…
7 │ 0.819591 0.413013 Q3: [0.7094574019643611, 0.86054…
8 │ 0.575109 0.169053 Q2: [0.5403359070225858, 0.70945…
9 │ 0.551803 0.0971433 Q2: [0.5403359070225858, 0.70945…
10 │ 0.540336 0.64679 Q2: [0.5403359070225858, 0.70945…
julia> combine(groupby(df, :bin), :ref .=> [minimum, maximum, mean], :other .=> [mean, std])
3×6 DataFrame
Row │ bin ref_minimum ref_maximum ref_mean other_mean other_std
│ Cat… Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────
1 │ Q1: [0.15332816708897712, 0.5403… 0.153328 0.2814 0.229186 0.339641 0.133206
2 │ Q2: [0.5403359070225858, 0.70945… 0.540336 0.575109 0.555749 0.304329 0.298752
3 │ Q3: [0.7094574019643611, 0.86054… 0.709457 0.860548 0.796037 0.686951 0.23253
2 Likes