Hierarchical Clustering of Samples

A hierarchical clustering of samples is a tree representation of their relative similarity.

The tree structure is generated by

  1. letting each sample be a cluster
  2. calculating pairwise distances between all clusters
  3. joining the two closest clusters into one new cluster
  4. iterating 2-3 until there is only one cluster left (which will contain all samples).
The tree is drawn so that the distances between clusters are reflected by the lengths of the branches in the tree. Thus, features with expression profiles that closely resemble each other have short distances between them, those that are more different, are placed further apart.

(See [Eisen et al., 1998] for a classical example of application of a hierarchical clustering algorithm in microarray analysis. The example is on features rather than samples).

To start the clustering:

        Toolbox | Microarray Analysis (Image expressionfolder)| Quality Control (Image quality_control_closed_16_h_p) | Hierarchical Clustering of Samples (Image sampleclustering)

Select a number of samples ( (Image array) or (Image rnaseq)) or an experiment (Image experiment) and click Next.

This will display a dialog as shown in figure 31.32. The hierarchical clustering algorithm requires that you specify a distance measure and a cluster linkage. The similarity measure is used to specify how distances between two samples should be calculated. The cluster distance metric specifies how you want the distance between two clusters, each consisting of a number of samples, to be calculated.

Image sample_clustering_step2
Figure 31.32: Parameters for hierarchical clustering of samples.

There are three kinds of Distance measures:

The possible cluster linkages are:

At the bottom, you can select which values to cluster (see Selecting transformed and normalized values for analysis).

Click Finish to start the tool.

Note: To be run on a server, the tool has to be included in a workflow, and the results will be displayed in a a stand-alone new heat map rather than added into the input experiment table.



Subsections