Best way to bin data from dataframe?

I have the classic problem. The data from my dataframe needs to be split into multiple dataframes, in particular if x < a1=> group1, x >=a1 && x<=a2 => group2, x >= a3 && x <=a4 => group3, etc…

Normally I would write something that would simply create empty dataframes and loop through appending each row into the appropriate DF (as opposed to running multiple separate filters).

However, I’m wondering if there’s a better way to do that. In particular it seems like I might be able to use the ‘by’ or ‘groupby’ function to do what I want, but i can’t quite figure it out…

1 Like

You could have a look at CategoricalArrays.cut

1 Like

Look at using the groupby which creates grouped dataframes. IIRC these are actually views into the original dataframes segmented by group. You don’t have to actually copy the data but can access them as if they were separate dataframes.

1 Like

yes , i did look at groupby, and it’s very straightforward to do it in multiple passes.

I should have been more clear, I was trying to figure out how i could do it in “1-shot”. The idea is to avoid repeated scans of the DF to categorize.

However knowing that groups are views is important. That means there’s no overhead in creating new dataframes, which is important because my dataset size keeps growing ! So that means it should be much more efficient than repeated filters.