Hierarchical clustering of samples
A hierarchical clustering of samples is a tree representation of their relative similarity. The tree structure is generated by
- letting each feature be a cluster
- calculating pairwise distances between all clusters
- joining the two closest clusters into one new cluster
- iterating 2-3 until there is only one cluster left (which will contain all samples).
The tree is drawn so that the distances between clusters are reflected by the lengths of the branches in the tree. Thus,
features with expression profiles that closely resemble each other have short distances between them, those that are more different, are placed further apart.
(See [Eisen et al., 1998] for a classical example of application of a hierarchical clustering algorithm in microarray analysis. The example is on features rather than samples).
To start the clustering:
Toolbox | Transcriptomics Analysis (
)| Quality Control | Hierarchical Clustering of Samples (
)
Select a number of samples ( (
) or (
)) or an experiment (
) and click Next.
This will display a dialog as shown in figure 27.76. The hierarchical clustering algorithm requires that you specify a distance measure and a cluster linkage. The similarity measure is used to specify how distances between two samples should be calculated. The cluster distance metric specifies how you want the distance between two clusters, each consisting of a number of samples, to be calculated.
Figure 27.76: Parameters for hierarchical clustering of samples.
At the top, you can choose three kinds of Distance measures:
Next, you can select the cluster linkage to be used:
- Single linkage. The distance between two clusters is computed as the distance between the two closest
elements in the two clusters.
- Average linkage. The distance between two clusters is computed as the average distance between objects from the first cluster and
objects from the second cluster. The averaging is performed over all pairs
, where
is an object from the first cluster and
is an object
from the second cluster.
- Complete linkage. The distance between two clusters is computed as the maximal object-to-object distance
, where
comes from the first cluster,
and
comes from the second cluster. In other words, the distance between two clusters is computed as the distance between the two farthest objects in the two clusters.
At the bottom, you can select which values to cluster (see Selecting transformed and normalized values for analysis).
Click Next if you wish to adjust how to
handle the results. If not, click Finish.
Subsections