Hierarchical clustering

Hierarchical clustering (also known as numerical taxonomy) is a branch of cluster analysis which treats clusters hierarchically, i.e., as a set of levels. The construction of the hierarchy can be performed using two major approaches, or combinations thereof: In agglomerative hierarchical clustering (a bottom-up approach), existing clusters are merged iteratively, while divisive hierarchical clustering (a top-down approach) starts out with all data in one cluster that is then split iteratively. At each step of the process, a mathematical measure of distance or similarity between (agglomerative) or within clusters (divisive) is being computed to determine how to split or merge.

Several different distance and similarity measures can be used, which generally result in different hierarchies (especially for agglomerative ones which start out based on local information only), thus complicating their interpretation. Nonetheless, hierarchical clustering is more intuitively understandable than flat clustering, and so it enjoys considerable popularity for multivariate analysis of data, e.g. of gene or protein sequences.