Differential Expression for RNA-Seq

Differential Expression for RNA-Seq performs a statistical differential expression test for a set of Expression Tracks. It uses multi-factorial statistics based on a negative binomial GLM. The tool supports paired designs and can control for batch effects. The statistical analysis is described in more detail in The statistical model.

To run the Differential Expression for RNA-Seq tool, you need Expression Tracks (Image rnaseqtrack_16_h_p) and a CLC Metadata Table that provides, at minimum, information about the conditions relevant for the statistical testing. The Expression Tracks provided as input must already have associations to this CLC Metadata Table (see the Metadata chapter).

The RNA-Seq and Differential Gene Expression Analysis template workflow includes Differential Expression for RNA-Seq and illustrates an approach where metadata can be provided in an Excel, CSV or TSV format file, avoiding the need to create a CLC Metadata Table before starting the analysis. See RNA-Seq and Differential Gene Expression Analysis for details.

Running the Differential Expression for RNA-Seq tool

To launch Differential Expression for RNA-Seq, go to:

        Toolbox | RNA-Seq and Small RNA Analysis (Image rna_seq_group_closed_16_n_p)| Differential Expression (Image rna_expression_folder_closed_16_n_p) | Differential Expression for RNA-Seq (Image dge_rnaseq_16_n_p)

Select a number of Expression tracks (Image rnaseqtrack_16_h_p) and click Next figure 33.74.

Image expressionrnaseq
Figure 33.74: Select a number of Expression Tracks.

For Expression Tracks (TE), the values used as input are "Total transcript reads". For Gene Expression Tracks (GE), the values used depend on whether a eukaryotic or prokaryotic organism is analyzed, i.e., if the option "Genome annotated with Genes and transcripts" or "Genome annotated with Genes only" is used. For Eukaryotes the values are "Total Exon Reads", whereas for Prokaryotes the values are "Total Gene Reads".

The order of comparisons can be controlled by changing the order of the inputs.

Normalization options are provided in the "Configure normalization method" step of the wizard (figure 33.75).

Image normamethod
Figure 33.75: Normalization methods.

First, choose the application that was used to generate the expression tracks: Whole transcriptome RNA-Seq, Targeted RNA-Seq, or Small RNA. For Targeted RNA-Seq and Small RNA, you can choose between two normalization methods: TMM and Housekeeping genes, while Whole transcriptome RNA-Seq will be normalized by default using the TMM method. For more detail on the methods see TMM Normalization.

TMM Normalization (Trimmed Mean of M values) calculates effective libraries sizes, which are then used as part of the per-sample normalization. TMM normalization adjusts library sizes based on the assumption that most genes are not differentially expressed.

Normalization with Housekeeping genes can be done when a set of housekeeping genes to use is available: in the "Custom housekeeping genes" field, type the name of the genes separated by a space. Finally choose between these two options:

When working with Targeted RNA Panels, we recommend that normalization is done using the Housekeeping genes method rather than TMM. Predefined list of housekeeping genes are available for samples generated using Human and Mouse QIAseq panels (hover with the mouse on the dialog to find the list of genes included in the set). If you are working with a custom panel, you can also provide the corresponding set of housekeeping genes in the "Custom housekeeping genes" as described above.

In the "Experimental design and comparison" wizard step, you are asked to provide information about the samples, test conditions, and the type of testing to carry out (figure 33.76).

Image experimental_design
Figure 33.76: Setting up the experimental design and comparisons.

In the Experimental design panel, the following information must be provided:

In the Comparisons panel, the type of test(s) to be run is specified. This affects the number and type of statistical comparison outputs generated (see Output of the Differential Expression for RNA-Seq tool for more details).

Depending on the type of comparison chosen, a Wald test or a Likelihood Ratio test will be used. For example, assume that we test a factor called 'Tissue' with three groups: skin, liver, brain.

Note: Fold changes are calculated from the GLM, which corrects for differences in library size between the samples and the effects of confounding factors. It is therefore not possible to derive these fold changes from the original counts by simple algebraic calculations.

In the "Configure filtering and outliers" wizard step, you choose whether to downweight outlier expressions, and whether to filter on average expression prior to FDR correction.

Downweighting outliers is appropriate when a standard differential expression analysis is enriched for genes that are highly expressed in just one sample. These genes do not fit the null hypothesis of no change in expression across samples. Downweighting comes at a cost to precision and so is not recommended generally. For more details, see Downweighting outliers.

Filtering maximizes the number of results that are significant at a target FDR threshold, but at the cost of potentially removing significant results with low average expression. For more details, see Filtering on average expression.

The outputs from Differential Expression for RNA-Seq are described in Output of the Differential Expression tools.