The output of Differential Expression for Single Cell
Differential Expression for Single Cell produces one or more Statistical Comparison Tables ().
Differentially expressed genes and clustering. Groups are often defined based on clusters found using a clustering algorithm. Because clustering and differential expression analysis are performed on the same data, they are not independent. This means that, even for simulated data generated from the same distribution, random differences in expression between genes may drive the formation of clusters, and these same genes will then be found to be differentially expressed between the clusters. One remedy for this is to perform clustering on half the data and differential expression on the other half. However, it is more common to simply be cautious about over-interpreting results.
|
The Statistical Comparison Table element
For each gene, the table has several columns whose interpretation depends on whether the tests performed are `All group pairs' or `Identify marker genes'. The difference in interpretation arises because the output of `Identify marker genes' is a summary of several pairwise comparisons of the kind produced by `All group pairs'.
For example, with three groups: `Platelet', `B cell', and `T cell', `All group pairs' will perform tests such as `Platelet vs B cell', whereas `Identify marker genes' will perform tests such as `Platelet vs rest'. `Platelet vs rest', will be a summary of the pairwise comparisons `Platelet vs B cell' and `Platelet vs T cell'.
- Case (#) , Case (%), Control (#), and Control (%). For each group in the statistical comparison, the number (#) and percentage (%) of cells expressing the gene is calculated. For `Platelet vs B cell', the case is `Platelet' and the control is `B cell'. For `Platelet vs rest', the case group is `Platelet', and the control groups are `B cell' and `T cell'. When there are multiple control groups, the minimum observed values for Control (#) and Control (%) are reported. Note that these two values might originate from two different control groups.
- Max group mean. For each group in the statistical comparison, the average expression value is calculated. For `Platelet vs B cell' the groups are `Platelet' and `B cell'. For `Platelet vs rest' the groups are `Platelet', `B cell' and `T cell'. The `Max Groups Mean' is the maximum of the average values.
- Log2 fold change. The logarithmic fold change.
- Fold change. The (signed) fold change. Genes that are not expressed in any cells used in the comparison have undefined fold changes and are reported as NaN (not a number). For an output of `Identify marker genes', the fold change for a gene is the smallest magnitude fold change found in its component pairwise comparisons.
- P-value. Standard p-value. Genes that are not expressed in sufficient cells are reported as NaN (not a number). For an output of `Identify marker genes', the p-value for a gene is the least significant p-value among the pairwise comparisons.
- FDR p-value. The false discovery rate corrected p-value. This is calculated directly from the values in the P-value column.
- Bonferroni. The Bonferroni corrected p-value. This is calculated directly from the values in the P-value column.
Downstream analyses using Statistical Comparison Tables
- Visualize the relationship between the p-values and the log2 fold changes using the volcano plot view, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Volcano_plots.html.
- Identify over-represented GO terms using the Gene Set Test () tool, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Gene_Set_Test.html.
- Compare differentially expressed genes from multiple Statistical Comparison Tables using the Create Venn Diagram for RNA-Seq () tool, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Venn_Diagram_RNA_Seq.html.
- Compare the p-values and fold changes of all genes from multiple Statistical Comparison Tables using the table view of the Venn diagram produced by Create Venn Diagram for RNA-Seq () tool, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Venn_Diagram_RNA_Seq.html.
- Investigate pathways associated with differentially expressed genes by uploading Statistical Comparison Tables to an existing Ingenuity Pathway Analysis account using the Pathway Analysis () tool from the Ingenuity Pathway Analysis plugin, see https://digitalinsights.qiagen.com/plugins/ingenuity-pathway-analysis/.
Note: Settings in Gene Set Test () and Pathway Analysis () for filtering features using the `Max group mean' need to be adjusted, as default values are based on the TPM measure of expression, which is rarely appropriate for single cell data.