QIAGEN Bioinformatics Manuals

Interpreting the output of Differential Expression for Single Cell

Differential Expression for Single Cell produces one or more Statistical Comparison Tables ().

For each gene, the table has several columns whose interpretation depends on whether the tests performed are `All group pairs' or `Identify marker genes'. The difference in interpretation arises because the output of `Identify marker genes' is a summary of several pairwise comparisons of the kind produced by `All group pairs'.

For example, with three groups: `Platelet', `B cell', and `T cell', `All group pairs' will perform tests such as `Platelet vs B cell', whereas `Identify marker genes' will perform tests such as `Platelet vs rest'. `Platelet vs rest', will be a summary of the pairwise comparisons `Platelet vs B cell' and `Platelet vs T cell'.

Case (#) , Case (%), Control (#), and Control (%). For each group in the statistical comparison, the number (#) and percentage (%) of cells expressing the gene is calculated. For `Platelet vs B cell', the case is `Platelet' and the control is `B cell'. For `Platelet vs rest', the case group is `Platelet', and the control groups are `B cell' and `T cell'. When there are multiple control groups, the minimum observed values for Control (#) and Control (%) are reported. Note that these two values might originate from two different control groups.
Max group mean. For each group in the statistical comparison, the average expression value is calculated. For `Platelet vs B cell' the groups are `Platelet' and `B cell'. For `Platelet vs rest' the groups are `Platelet', `B cell' and `T cell'. The `Max Groups Mean' is the maximum of the average values.
Log2 fold change. The logarithmic fold change.
Fold change. The (signed) fold change. Genes that are not expressed in any cells used in the comparison have undefined fold changes and are reported as NaN (not a number). For an output of `Identify marker genes', the fold change for a gene is the smallest magnitude fold change found in its component pairwise comparisons.
P-value. Standard p-value. Genes that are not expressed in sufficient cells are reported as NaN (not a number). For an output of `Identify marker genes', the p-value for a gene is the least significant p-value among the pairwise comparisons.
FDR p-value. The false discovery rate corrected p-value. This is calculated directly from the values in the P-value column.
Bonferroni. The Bonferroni corrected p-value. This is calculated directly from the values in the P-value column.

Differentially expressed genes (DEGs) and clustering. Groups are often defined based on clusters found using a clustering algorithm. Because clustering and differential expression analysis are performed on the same data, they are not independent. This means that, even for simulated data generated from the same distribution, random differences in expression between genes may drive the formation of clusters, and these same genes will then be found to be DEGs between the clusters. One remedy for this is to perform clustering on half the data and differential expression on the other half. However, it is more common to simply be cautious about over-interpreting results.
A similar warning can be made for groups defined based on cell types predicted by Predict Cell Types - the tool works by learning the expression pattern of different genes in different cell types. Therefore, it is likely that many DEGs between cell types assigned by Predict Cell Types have been implicitly learned by the tool, and may not be specific to the dataset being analyzed.

The Statistical Comparison Table also offers a volcano plot view, showing the relationship between the p-values and the log₂ fold changes, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Volcano_plots.html for details.

Statistical Comparison Tables can be used in several tools from Toolbox | RNA-Seq and Small RNA Analysis (). The most useful of these in a single-cell context are:

Gene Set Test () for identifying over-represented GO terms http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Gene_Set_Test.html
Create Venn Diagram for RNA-Seq () for comparing the overlap of differentially expressed genes in two or more Statistical Comparison Tables http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Venn_Diagram_RNA_Seq.html
Create Expression Browser () For viewing multiple Statistical Comparison Tables and their GO terms in a single table http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Expression_Browser.html

It is also possible to automatically upload Statistical Comparison Tables to an existing Ingenuity Pathway Analysis account using the Ingenuity Pathway Analysis plugin https://digitalinsights.qiagen.com/plugins/ingenuity-pathway-analysis/

Note that many of these tools have options to filter features by Max group mean with a default filtering that is based on the TPM measure of expression. This default will often need adjusting for single cell data where TPM is rarely appropriate.

Browse the manual

Interpreting the output of Differential Expression for Single Cell