Based on an annotated reference genome, the CLC Genomics Workbench supports RNA-Seq Analysis by mapping next-generation sequencing reads and distributing and counting the reads across genes and transcripts. Subsequently, the results can be used for expression analysis. The tools from the RNA-Seq folder automatically account for differences due to sequencing depth, removing the need to normalize input data. The statistical analysis and visualization tools of the RNA-Seq folder make extensive use of the metadata system.

TMM Normalization

Since the sequencing depth might differ between samples, a per-sample library size normalization must be performed before samples can be compared. In the case of the tools included in the RNA-Seq folder, this normalization is automatically applied by the tools.

For the RNA-Seq tools that compare samples (PCA for RNA-Seq, Create Heat Map for RNA-Seq, Differential Expression for RNA-Seq and Create Expression Browser), library size normalization is automatically performed using the TMM (trimmed mean of M values) method [Robinson and Oshlack, 2010]. Libraries sizes are then used as part of the per-sample normalization. TMM normalization is the normalization used in edgeR [Robinson et al., 2010].

TMM normalization adjusts library sizes based on the assumption that most genes are not differentially expressed. Therefore, it is important not to make subsets of the count data before doing statistical analysis or visualization, as this can lead to differences being normalized away.

For the expression visualization tools (Create Heat Map and PCA for RNA-Seq) additional filtering and normalization are performed:

Metadata for RNA-Seq

The statistical analysis and visualization tools of the RNA-Seq folder make extensive use of the metadata system. For example, metadata are required when defining the experimental design in the Differential Expression for RNA-Seq tool, and can be used to add extra layers of insight in the Create Heat Map and PCA for RNA-Seq tools.

To get the most out of these tools we recommend that all input expression tracks have associated metadata, as shown in figure 28.1. For information about how to use and setup metadata, please see Metadata.

Image metadata_tissue
Figure 28.1: An example of expression tracks with associated metadata.