SVMs for cell type classification
The algorithm for cell type classification uses Support Vector Machines (SVMs). Independent benchmarking [Abdelaal et al., 2019] has shown that SVM classifiers can outperform most other types of classifiers for cell type prediction. The implementation uses liblinear (https://github.com/bwaldvogel/liblinear-java) with a series of additions:
- Feature expression is log transformed and scaled using the maximum observed expression, such that values are placed in the interval .
- Weights are used to balance the size of the training classes.
- Platt scaling is applied on the decision values to obtain probabilities [Platt et al., 1999]. Note that probabilities are not normalized: the sum of the probabilities of all cell types for one cell does not necessarily sum up to one.
- For each cell, the most likely cell type consistent with the chosen tissue(s) is assigned from the Platt probabilities. Cells are labeled as unknown (see Predict Cell Types) when the most likely cell type has a Platt probability below 0.5.
Note that the algorithm uses the raw, not normalized, expression values.