Interpreting the output of Differential Expression for Single Cell

Differential Expression for Single Cell produces one or more Statistical Comparison Tables (Image sc_stat_comp_16_n_p).

Differentially expressed genes and clustering. Groups are often defined based on clusters found using a clustering algorithm. Because clustering and differential expression analysis are performed on the same data, they are not independent. This means that, even for simulated data generated from the same distribution, random differences in expression between genes may drive the formation of clusters, and these same genes will then be found to be differentially expressed between the clusters. One remedy for this is to perform clustering on half the data and differential expression on the other half. However, it is more common to simply be cautious about over-interpreting results.
A similar warning can be made for groups defined based on cell types predicted by Predict Cell Types - the tool works by learning the expression pattern of different genes in different cell types. Therefore, it is likely that many differentially expressed genes between cell types assigned by Predict Cell Types have been implicitly learned by the tool, and may not be specific to the dataset being analyzed.

The Statistical Comparison Table element

For each gene, the table has several columns whose interpretation depends on whether the tests performed are `All group pairs' or `Identify marker genes'. The difference in interpretation arises because the output of `Identify marker genes' is a summary of several pairwise comparisons of the kind produced by `All group pairs'.

For example, with three groups: `Platelet', `B cell', and `T cell', `All group pairs' will perform tests such as `Platelet vs B cell', whereas `Identify marker genes' will perform tests such as `Platelet vs rest'. `Platelet vs rest', will be a summary of the pairwise comparisons `Platelet vs B cell' and `Platelet vs T cell'.

Downstream analyses using Statistical Comparison Tables

Note: Settings in Gene Set Test (Image identify_differentially_expressed_genes_16_n_p) and Pathway Analysis (Image new_ingenuity_pathway_analysis_blue_16_n_p) for filtering features using the `Max group mean' need to be adjusted, as default values are based on the TPM measure of expression, which is rarely appropriate for single cell data.