SVMs for cell type classification
The algorithm for cell type classification uses Support Vector Machines (SVMs). Independent benchmarking [Abdelaal et al., 2019] has shown that SVM classifiers can outperform most other types of classifiers for cell type prediction. The implementation uses liblinear (https://github.com/bwaldvogel/liblinear-java) with a series of additions:
- Feature expression is log transformed and scaled using the maximum observed expression, such that values are placed in the interval .
- Weights are used to balance the size of the training classes.
- Platt scaling is applied on the decision values to obtain probabilities [Platt et al., 1999]. Note that probabilities are not normalized: the sum of the probabilities of all cell types for one cell does not necessarily sum up to one.
- For each cell, the most likely cell type is assigned from the Platt probabilities. Cells are labeled as unknown (see Predict Cell Types) when the most likely cell type has a Platt probability below 0.5.
Note that the algorithm uses the raw, not normalized, expression values.