I’m new to Julia, but already impressed by it’s syntax and by statsplots. However, some functionality is still missing that I miss from Python/R.
Specifically for this post, I would like to color-code the output of clustering.hclust, as is seen in python/R, so that it looks more like this:
Here is a minimally working example to get a distance matrix and dendrogram with group behavior.
using Clustering
using StatsPlots
using Distances
#Generate a random dataset, but with 10 correlated structures
data_matrix = rand(50,1000)
for i=1:10
data_matrix[:, 1:i*100] .+= .25
end
#Generate distance matrix, perform hierarchical clustering, plot
dist_mat = pairwise(Euclidean(1e-12), data_matrix, dims=2)
hcl1 = hclust(dist_mat, linkage=:ward)
dg = plot(hcl1, xticks=false, yticks=true)
plot!(size=(800,200))
I am using the GR backend.
Considering I’m just starting to use julia, is this problem worth solving in pure julia? Is there a better way to approach this? e.g., use a different plotting package/backend, or import python/R functions and not waste time?
Now, here is how far I’ve gotten, and the direction of “pure julia” to get this to work. However, accessing the line elements seems really unintuitive and was very difficult to even find.
function dendrocolor(dg)
for i=1:10000
try
dg[1][i].plotattributes[:linecolor] = RGBA{Float64}(0.8888735002725198,0.43564919034818994,0.2781229361419438,1.0)
catch e
break
end
end
end
This will change the entire dendrogram orange. I’m thinking I could
- Use this function and call each line separately in the dendrogram
- leverage hcl1 structure which contains merges and height
- use the clustering.cutree function to accomplish this
However, I don’t know where to start parsing all that together, and it seems like a lot of work. Is there a better way?
All advice and help is appreciated!
Clay