Cell calling
Barcodes with a low number of reads are always removed as ambient droplets. If Calculate maximum number of reads for droplets to be ambient is selected (see Empty droplets filter), an automatically estimated threshold is used for detecting such barcodes. The threshold is set to 100, or identified from the histogram of number of reads for those droplets that have at most 500 reads, using the Otsu method [Otsu, 1979], whichever is largest. When the threshold is calculated automatically, the following need to be met:
- The minimum number of reads across all droplets is at most 100.
- At least 10% of all droplets have at most 500 reads.
- At least 100 barcodes are identified as ambient droplets.
Barcodes with a high number of reads are always retained as cell-containing droplets. If Calculate minimum number of reads for droplets to be cells is selected (see Empty droplets filter), the automatically estimated knee is used for detecting such barcodes. The knee is identified from the smoothed log-log rank data (figure 7.9) where the the ambient droplets are removed. An adaptation of the [Satopaa et al., 2011] algorithm implemented in https://github.com/mariolpantunes/ml is used.
The algorithm for testing if barcodes with an intermediate number of reads are cells is based on EmptyDrops [Lun et al., 2019]:
- The ambient RNA profile is estimated from the ambient droplets. The expressions from these droplets are added together and a proportion vector for the ambient profile is obtained using the Good Turing algorithm [Gale and Sampson, 1995].
- Barcodes with an intermediate number of reads are tested for significant deviations from the ambient profile. For each barcode, the probability of obtaining its expression profile from the ambient is calculated. A p-value is obtained from the probabilities of ambient simulated barcodes containing the same total number of reads.
- FDR correction is applied to the p-values for barcodes that are not part of the ambient profile.
- Barcodes with FDR-corrected p-values below the provided value in FDR threshold (see Empty droplets filter) are retained as non-empty droplets.