Clustering of features and samples

Hierarchical clustering clusters taxa by the similarity of their taxonomic profiles over the set of samples, and samples by the similarity of taxonomic composition over the set of features (taxa).

Each clustering has a tree structure that is generated as follows:

  1. Letting each taxa or sample be a cluster.
  2. Calculating pairwise distances between all clusters
  3. Joining the two closest clusters into one new cluster.
  4. Iterating 2-3 times until there is only one cluster left (which contains all the taxa or samples).

In the resulting tree, the length of branches reflect the distance between clusters.

To create a heat map:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Metagenomics (Image wma_folder_open_flat_16_n_p) | Abundance Analysis (Image abundance_folder_closed_16_n_p) | Create Heat Map for Abundance Table (Image heatmap_16_n_p)

Select an abundance table with two or more samples as input (e.g., a multi-sample OTU or merged abundance table) and click Next.

Specify a distance measure and a cluster linkage (figure 6.13). The distance measure is used to specify how distances between two taxa or samples should be calculated. The cluster linkage specifies how the distance between two clusters, each consisting of a number of taxa or samples, should be calculated. Learn more about how distances and clusters are calculated at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Clustering_features_samples.html.

Image heatmapotu2
Figure 6.13: Select an abundance table.

After having selected the distance measure, set up the feature filtering options (figure 6.14).

Image heatmapotu1
Figure 6.14: Set filtering options.

Genomes usually contain too many features to allow for a meaningful visualization. Clustering hundreds of thousands of features is also very time consuming. We therefore recommend to reduce the number of features before clustering, using the filter options available: