Interpreting the output of Differential Expression for Single Cell
Differential Expression for Single Cell produces one or more Statistical Comparison Tables ().
For each gene, the table has several columns whose interpretation depends on whether the tests performed are `All group pairs' or `Identify marker genes'. The difference in interpretation arises because the output of `Identify marker genes' is a summary of several pairwise comparisons of the kind produced by `All group pairs'.
For example, with three groups: `Platelet', `B cell', and `T cell', `All group pairs' will perform tests such as `Platelet vs B cell', whereas `Identify marker genes' will perform tests such as `Platelet vs rest'. `Platelet vs rest', will be a summary of the pairwise comparisons `Platelet vs B cell' and `Platelet vs T cell'.
- Max group means For each group in the statistical comparison, the average expression value is calculated. For `Platelet vs B cell' the groups are `Platelet' and `B cell'. For `Platelet vs rest' the groups are `Platelet', `B cell' and `T cell' .The Max Groups Means is the maximum of the average values.
- log2 fold change The logarithmic fold change.
- Fold change The (signed) fold change. Genes that are not expressed in any cell have undefined fold changes and are reported as NaN (not a number). For an output of `Identify marker genes' the fold change for a gene is the smallest magnitude fold change found in its component pairwise comparisons.
- P-value Standard p-value. Genes that are not expressed in any sample have undefined p-values and are reported as NaN (not a number). For an output of `Identify marker genes' the p-value for a gene is the least significant p-value among the pairwise comparisons.
- FDR p-value The false discovery rate corrected p-value. This is calculated directly from the values in the P-value column.
- Bonferroni The Bonferroni corrected p-value. This is calculated directly from the values in the P-value column.
Differentially expressed genes (DEGs) and clustering Groups are often defined based on clusters found using a clustering algorithm. Because clustering and differential expression analysis are performed on the same data, they are not independent. This means that, even for simulated data generated from the same distribution, random differences in expression between genes may drive the formation of clusters, and these same genes will then be found to be DEGs between the clusters. One remedy for this is to perform clustering on half the data and differential expression on the other half. However, it is more common to simply be cautious about over-interpreting results.
|
Statistical Comparison Tables can be used in several tools from Toolbox | RNA-Seq and Small RNA Analysis (). The most useful of these in a single-cell context are:
- Gene Set Test (
) for identifying over-represented GO terms https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Gene_Set_Test.html
- Create Venn Diagram for RNA-Seq (
) for comparing the overlap of differentially expressed genes in two or more Statistical Comparison Tables https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Venn_Diagram_RNA_Seq.html
- Create Expression Browser (
) For viewing multiple Statistical Comparison Tables and their GO terms in a single table https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Expression_Browser.html
It is also possible to automatically upload Statistical Comparison Tables to an existing Ingenuity Pathway Analysis account using the Ingenuity Pathway Analysis plugin https://digitalinsights.qiagen.com/plugins/ingenuity-pathway-analysis/
Note that many of these tools have options to filter features by Max group means with a default filtering that is based on the RPKM measure of expression. This default will often need lowering for single cell data where RPKM is rarely appropriate. For example, when data has been normalized by Normalize Single Cell Data, `Max group means' uses Pearson residuals.