How to Collapse Dendrogram Branches to n Number of Leaves

ZVerdup · February 26, 2025, 6:01am

Hi, I need some guidance on how to handle the readability of dendrograms plotted using thousands of observations. I want to use to dendrogram to decide how many clusters I should use for further analysis.

My current strategy is to collapse the branches so that there are only a limited number of leaves, say 10. Ideally, the number under the leaf would be the quantity of observations contained therein. I would then want to print out the indices from the data within that leaf. I have seen similar functionality in Matlab and in the truncate options of R, but I can’t seem to find anything in Julia. Can it be done by a novice such as myself? Here’s a simple working example:

using Clustering, Random, Distances, StatsPlots

# Set seed for reproducibility
Random.seed!(42)

# Generate a random dataset with 1000 observations and 8 attributes
num_observations = 1000
num_attributes = 8
data = rand(num_observations, num_attributes)  # Random values between 0 and 1

# Compute the pairwise distance matrix (Euclidean distance)
distance_matrix = pairwise(Euclidean(), data, dims=1)

# Perform hierarchical clustering
hclust_result = hclust(distance_matrix, linkage=:ward)

plot(hclust_result)

The resulting plot is shown below. Could anyone help me to implement my plan or convince me that there’s a better way? Yes, Im new to Julia and new to discourse.julialang, thank you so much for your help. Cheers!

bertschi · February 26, 2025, 5:10pm

Ok, this seems to work … and here is how I found it:

julia> hclust_result |> typeof
Hclust{Float64}

julia> methodswith(Hclust)
[1] getproperty(hclu::Hclust, prop::Symbol) @ Clustering ~/.julia/packages/Clustering/M6mjF/src/deprecate.jl:25
[2] propertynames(hclu::Hclust) @ Clustering ~/.julia/packages/Clustering/M6mjF/src/deprecate.jl:20
[3] propertynames(hclu::Hclust, private::Bool) @ Clustering ~/.julia/packages/Clustering/M6mjF/src/deprecate.jl:20
[4] cutree(hclu::Hclust; k, h) @ Clustering ~/.julia/packages/Clustering/M6mjF/src/hclust.jl:810

help?> cutree
search: cutree hclust_result ClusteringResult AbstractUnitRange

  cutree(hclu::Hclust; [k], [h]) -> Vector{Int}

...

julia> foo = cutree(hclust_result; k = 5)
1000-element Vector{Int64}:
 1
 2
 1
 1
 3

# Seems to return the cluster ID of each sample ...
# Fun (but inefficient) APL-like one-liner to get sets of indices for each
julia> getindex.(Ref(eachindex(foo)), eachrow(unique(foo) .== foo'))
5-element Vector{Vector{Int64}}:
 [1, 3, 4, 10, 11, 13, 14, 20, 22, 26  …  939, 944, 951, 958, 965, 972, 981, 982, 989, 1000]
 [2, 7, 9, 18, 21, 25, 27, 29, 32, 34  …  975, 977, 979, 980, 983, 985, 987, 996, 997, 998]
...

ZVerdup · February 26, 2025, 5:16pm

I was aware of cutree, but you helped me understand it better. Thanks! Though, I’m trying to plot the resulting “cut” dendrogram. I’m afraid there is not an easy way yet…?

bertschi · February 26, 2025, 5:36pm

Ok, not sure if there is a builtin way. How would you like the plot to look like, i.e., how would you do it in R?
A quick and dirty way might be to just limit the y-scale, i.e., sort of like cutting at that height: plot(hclust_result; ylim = (4.3, Inf))

Topic		Replies	Views
Is there a way to plot dendrogram? General Usage plotting	8	4216	June 23, 2023
Statsplots dendrogram - how to have clustered coloring Visualization cluster , visualization , statsplots	12	2654	June 8, 2022
StatsPlots dendrogram on the right side or rotated Visualization question , plotting	3	1053	November 22, 2019
Can be visualized clustering with the heatmap()? General Usage plots , clustering , statsplots	8	1015	November 15, 2020
Cluster heat map in Julia General Usage	5	1812	August 22, 2017

How to Collapse Dendrogram Branches to n Number of Leaves

Related topics