Clustering of features and samples

The hierarchical clustering clusters features by the similarity of their expression profiles over the set of samples. It clusters samples by the similarity of expression patterns over their features.

Each clustering has a tree structure that is generated by

  1. Letting each feature or sample be a cluster.
  2. Calculating pairwise distances between all clusters.
  3. Joining the two closest clusters into one new cluster.
  4. Iterating 2-3 until there is only one cluster left (which contains all the features or samples).

The tree is drawn so that the distances between clusters are reflected by the lengths of the branches in the tree.

To create a heat map:

        Toolbox | RNA-Seq Analysis | Create Heat Map for RNA-Seq (Image heatmap_16_n_p)

Select at least two expression tracks (Image rnaseqtrack_16_h_p) and click Next.

This will display the wizard shown in figure 28.33. The hierarchical clustering algorithm requires that you specify a distance measure and a cluster linkage. The distance measure is used to specify how distances between two features or samples should be calculated. The cluster linkage specifies how the distance between two clusters, each consisting of a number of features or samples, should be calculated.

Image heatmap_set_parameters
Figure 28.33: Parameters for Create Heat Map.

There are three kinds of Distance measures:

The possible cluster linkages are:

After having selected the distance measure, click Next to set up the feature filtering options as shown in figure 28.34.

Image heatmap_set_filtering
Figure 28.34: Feature filtering for Create Heat Map.

Genomes usually contain too many features to allow for a meaningful visualization of all genes or transcripts. Clustering hundreds of thousands of features is also very time consuming. Therefore it is recommend to reduce the number of features before clustering and visualization.

There are several different Filter settings to filter genes or transcripts: