using DataFrames
using CategoricalArrays
x = [1.2, 2.5, 3.1, 4.8, 5.2, 6.3, 7.7, 2.0, 3.8, 4.2, 5.7, 6.1, 7.4, 8.9]
y = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 15.0, 25.0, 35.0, 45.0, 55.0, 65.0, 75.0]
df = DataFrame(x=x, y=y)
# Define bin size and bin edges
bin_size = 2.0
min_x, max_x = extrema(df.x)
bins = min_x:bin_size:max_x
# Create a new column with the bin labels
df.bin_labels = cut(df.x, bins, extend=true)
# Group by the bin labels and calculate the sum of y-coordinates
grouped_df = combine(groupby(df, :bin_labels), :y => sum, renamecols=false)
The above yields the expected result:
4×2 DataFrame
Row │ bin_labels y
│ Cat… Float64
─────┼─────────────────────
1 │ [1.2, 3.2) 75.0
2 │ [3.2, 5.2) 100.0
3 │ [5.2, 7.2) 210.0
4 │ [7.2, 8.9] 210.0
A following question would be, how can I now e.g. plot the binned x
column against the binned y
?
I guess one has to choose values within a range for each binned x
, like e.g. (3.2-1.2)/2 = 1.0 for the first point and so on (choosing the middle of the range/bin for example).
Is there a programmatic way of doing this that I’ve missed in my 30 min. introduction to CategoricalArrays.jl
?