The Identify and Annotate Differentially Expressed Genes and Pathways compares genes expression in different groups of samples and performs a gene ontology (GO) enrichment analysis on the differentially expressed genes to identify affected pathways. The workflow takes as input Gene Expression (GE) or Transcript Expression (TE) tracks that were generated using the RNA-Seq analysis tool. The samples must be associated to a metadata table.
The workflow can be found in the Toolbox at:
Toolbox | Template Workflows | Biomedical Workflows () | Whole Transcriptome Sequencing () | Identify and Annotate Differentially Expressed Genes and Pathways ()
- If you are connected to a server, you will first be asked where you would like to run the analysis.
- Next, you will be asked to select the samples to analyze (figure 22.32). You can select several GE tracks or TE tracks generated by the RNA-Seq analysis tool, but not a combination of both.
- Then select the reference data set that should be used to annotate variants (figure 22.33).
- In the Differential Expression for RNA-Seq dialog, you can set up the experimental design associated with the data (figure 22.34):
- Choose the metadata table that was associated to the GE or TE tracks used in the previous step.
- Choose the factor (must be one of the metadata category) that should be used to test for differential expression.
- It is possible to specify confounding factors, i.e., factors that are not of primary interest, but may affect gene expression.
- The Comparisons panel determines the number and type of statistical comparison tracks output by the workflow (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Output_Differential_Expression_tools.html for more details).
- In the last wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters.
In the Preview All Parameters wizard you can only check the settings, and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows.
- Choose to Save your results and click on the button labeled Finish.
The following outputs are generated:
- PCA for RNA-Seq plot () Projects a high-dimensional dataset (where the number of dimensions equals the number of genes or transcripts) onto two or three dimensions.
- Statistical Comparison () The information can be accessed in two different ways:
- Open as a track, hold shift and hover over a feature. A tooltip will appear with information about gene name, results of statistical tests, and expression values.
- Open the track in table format by clicking on the table icon in the lower left side of the View Area.
- Track List Differentially Expressed Genes and Pathways () A collection of tracks presented together. Shows the human reference sequence, annotation tracks for genes, coding regions CDS, mRNA, and statistical comparison tracks (see figure 22.35).
- Heat Map for RNA-Seq () A two dimensional heat map of expression values. Each column corresponds to one sample, and each row corresponds to a feature (a gene or a transcript). The samples and features are both hierarchically clustered.
- Venn Diagram () To compare the overlap of differentially expressed genes or transcripts in two or more statistical comparison tracks.
- Expression Browser () To inspect gene and transcript expression level counts and statistics for many samples at the same time.
- GO Enrichment Analysis () A table showing the results of the GO enrichment analysis. The table includes GO terms, a description of the affected function/pathway, the number of genes in each function/pathway, the number of affected genes within the function/pathway, and p-values.
Please refer to the relevant sections of the http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_Analysis.html for additional information on the different output mentioned above.