Train Cell Type Classifier
The Train Cell Type Classifier tool trains a Cell Type Classifier which can be used in the Predict Cell Types tool (see Predict Cell Types).
The tool learns to distinguish different cell types by learning specific expression patterns from the expression values of cells that are already assigned a cell type.
It can be found in the Toolbox here:
Cell Annotation () | Train Cell Type Classifier ()
The tool takes an Expression Matrix () as input. The following options can be adjusted (figure 7.2):
Figure 7.2: The options in the dialog of the Train Cell Type Classifier tool. A Cell Type Classifier for human data downloaded from the Reference Data Manager has been selected.
- Cell type clusters. A Cell Clusters object containing clusters for the input matrix.
- Cell type category. The category from the Cell Clusters object which contains the clusters representing cell types. The tool cannot distinguish the clusters that are true cell types, and therefore this category should only contain clusters that truly represent cell types. It is not required for all cells to belong to a cluster and cells with unknown cell types should be left unannotated, rather than being clustered in an "unknown" cluster.
- Cell type classifier. A Cell Type Classifier downloaded from the Reference Data Manager (see The Reference Data Manager) or produced by this tool. This is optional and allows extending existing classifiers with new data. To keep the running times and the size of the resulting Cell Type Classifier low, the tool uses up to approximately 50 training cells per cell type, which are chosen randomly to include cells from every sample present in the data. If the data contains more than 50 samples, one cell will be chosen randomly from each sample. When a Cell Type Classifier is provided, the cells to be used during training can be preferentially chosen from the classifier or the incoming data as follows:
- Treat all cells equally. The tool will use cells from both the classifier and the incoming data in a as uniform manner as possible. This is the default option and it ensures that all samples present in both the classifier and incoming data are represented in the training cells.
- Use incoming cells first. The tool will preferentially use cells from the incoming data. If there are less than 50 cells for any given cell type, further cells will be chosen from the classifier.
- Use existing cells first. The tool will preferentially use cells from the classifier. If there are less than 50 cells for any given cell type, further cells will be chosen from the incoming data.
Subsections
- Interpreting the output of Train Cell Type Classifier
- Features used for training and prediction
- SVMs for cell type classification