cluster analysis - R plot upper dendrogram based on k -
i clustering distance matrix based on 20,000 row x 169 column data set in r using hclust(). when convert cluster object dendrogram , plot entire dendrogram, difficult read because large, if output large pdf.
df <- as.data.frame(matrix(abs(rnorm(3380000)), nrow = 20000)) mydist <- vegdist(df) my.hc <- hclust(mydist, method = "average") hcd <- as.dendrogram(my.hc) pdf("hclust_plot.pdf", width = 40, height = 15) plot(hcd) dev.off()
i specify number of clusters (k) @ truncate dendrogram, plot upper portion of dendrogram above k split points. know can plot upper portion based on specifying height (h) using function cut().
pdf("hclust_plot2.pdf", width = 40, height = 15) plot(cut(hcd, h = 0.99)$upper) dev.off()
i know can use dendextend package color dendrogram plot k groups.
library(dendextend) pdf("hclust_plot3.pdf", width = 40, height = 15) plot(color_branches(hcd, k = 44)) dev.off()
but data set, dendrogram dense read group color. there way plot upper portion of dendrogram above cut point specifying k, not h? or there way h value dendrogram, given k?
you can use heights_per_k.dendrogram
function the dendextend package, heights various k cuts.
for example:
## not run: hc <- hclust(dist(usarrests[1:4,]), "ave") dend <- as.dendrogram(hc) library(dendextend) dend_h <- heights_per_k.dendrogram(dend) par(mfrow = c(1,2)) plot(dend) plot(dend, ylim = c(dend_h["3"], dend_h["1"]))
and in case:
set.seed(2016-01-16) df <- as.data.frame(matrix(abs(rnorm(2*20000)), nrow = 20000)) mydist <- dist(df) my.hc <- hclust(mydist, method = "average") hcd <- as.dendrogram(my.hc) library(dendextend) library(dendextendrcpp) dend_h <- heights_per_k.dendrogram(hcd) # (this can take time) plot(hcd, ylim = c(dend_h["43"], dend_h["1"]))
Comments
Post a Comment