Interpreting the output of Train Cell Type Classifier

The tool outputs a Cell Type Classifier (Image cell_type_classifier_16_n_p) element. The table view of the element gives a summary of (figure 7.3):

Image classifier_table_view
Figure 7.3: The table view of a Cell Type Classifier trained on HCL data (http://bis.zju.edu.cn/HCL) containing 106 different samples. For cell types present in more than 50 samples, one cell is chosen from each sample. The sample columns (here, starting with "HCL-") contain the number of training cells used from the respective sample. The "Top features" columns list the most important features used by the classifier to distinguish each cell type from the rest.

The impact of the strategy for choosing cells during training when extending a classifier with new data (see Train Cell Type Classifier) can be investigated in the sample columns of this table view.

The classifier assigns weights to each feature according to how informative it is for distinguishing each cell type from the rest. The "Top features supporting this cell type" and "Top features supporting another cell type" list up to 10 features with the largest weights. If a cell has high expression for the features in "Top features supporting this cell type", it is a good indication that it is of that specific cell type, while if it has high expression for the features in "Top features supporting another cell type", it is a good indication that it is not of that specific cell type. Note that the classifier uses more information than that summarized in these two columns, and the combined expression for all features together with the assigned weights is used for predicting cell types.

If the top features are assigned ids from either Ensembl or Entrez, the feature names are clickable and the link will open the corresponding Ensembl or Entrez webpage.

Top features and markers. The top features identified by the classifier are different than the markers identified by the Differential Expression for Single Cell tool (see Differential Expression for Single Cell). A cell type marker has different expression in the cell type compared to all other cell types, and this is calculated independently for each feature. The classifier top features are useful jointly in recognizing a specific cell type, but might not necessarily be very informative on their own.
Let us consider the following cell types A-D with the given average expression for features X-Z. The cell types might then have the listed top features and markers:

  A B C D
X 4 4 0 8
Y 2 4 2 6
Z 0 0 2 4
Top features supporting this cell type X X, Y Y, Z X, Y, Z
Top features supporting another cell type Y, Z Z X -
Markers - Y X, Z X, Y, Z