The cell calling algorithm

Barcodes with a low number of reads are always removed as ambient droplets. If Calculate maximum number of reads for droplets to be ambient is selected (see Empty droplets filter), an automatically estimated threshold is used for detecting such barcodes. The threshold is set to $ 100$, or identified from the histogram of number of reads for those droplets that have at most $ 500$ reads, using the Otsu method [Otsu, 1979], whichever is largest. When the threshold is calculated automatically, the following need to be met:

If any of the above checks are not met, the threshold is set such that no barcodes is an ambient droplet and hence cell calling is not performed.

Barcodes with a high number of reads are always retained as cell-containing droplets. If Calculate minimum number of reads for droplets to be cells is selected (see Empty droplets filter), the automatically estimated knee is used for detecting such barcodes. The knee is identified from the smoothed log-log rank data (figure 5.9) where the the ambient droplets are removed. An adaptation of the [Satopaa et al., 2011] algorithm implemented in https://github.com/mariolpantunes/ml is used.

The algorithm for testing if barcodes with an intermediate number of reads are cells is based on EmptyDrops [Lun et al., 2019]: