How to identify clusters of values in a dataframe (or grouped dataframe)

rj3838 · March 31, 2025, 3:57pm

I am trying to identify clusters of values within a dataframe (it’s actually grouped but I’m handling it in sections). The data is of road defects so the actual data is large, I know that isn’t relevant for this but it gives some idea of the volume involved.

#Create DataFrame with multiple ranges of the number

data = Dict(
    "A" => [1, 7, 3, 4, 7, 7],
    "B" => [5, 6, 1, 8, 9, 9],
    "C" => [7, 10, 10, 11, 7, 7],
    "D" => [13, 13, 14, 15, 7, 7],
    "E" => [7, 10, 10, 11, 7, 7],
)
df = DataFrame(data)'

The values I need to identify in this case are the clusters of the digit 7. That is col A, row 5 and 6 also cols C,D and E in rows 5 and 6.

6×5 DataFrame
Row │ A B C D E
│ Int64 Int64 Int64 Int64 Int64
─────┼─────────────────
1 │ 1 5 7 13 7
2 │ 7 6 10 13 10
3 │ 3 1 10 14 10
4 │ 4 8 11 15 11
5 │ 7 9 7 7 7
6 │ 7 9 7 7 7

The value to find for the cluster will change and there may be any number of clusters.

I can identify where the number 7 is by using enumerate along the rows and cols but I’m trying to find something easier as the result gives the rows and columns without the clusters they belong to.

The output/return value should be a cluster number and it’s position as I need to calculate the max width and height of the cluster.

juliohm · March 31, 2025, 4:43pm

Can you elaborate on why Clustering.jl algorithms can’t be used directly? Or geostatistical clustering algorithms?

rocco_sprmnt21 · April 3, 2025, 8:31pm

What outcome would you expect in this case?

Topic		Replies	Views
Grouping by values in either of two columns Data question	13	786	April 14, 2024
New To Julia: How to get 2 values from dataset New to Julia question , plotting	3	384	August 17, 2021
Find unique row in DataFrame General Usage	5	1649	May 17, 2018
Tag each unique combination of column values in DataFrames Data dataframes	5	1116	February 23, 2022
Create a GroupedDataFrame by the relations of rows rather than the values of the rows in a column, e.g `groupby` consecutive dates? New to Julia question , dataframes , grouped-data	14	707	March 29, 2023

How to identify clusters of values in a dataframe (or grouped dataframe)

Related topics