Automatic thresholds
Low quality barcodes can be identified using the distributions of the metrics listed below, see Count-based and extra-chromosomal filters for more details.
- Total number of reads
- Total number of expressed features
- Percentage of reads mapped to spike-in control regions
- Percentage of reads mapped to features indicative of low quality
When determining an automatic threshold for a metric distribution, the aim is to identify a point within the distribution that separates the low quality barcodes. The following approach is used:
- Compute the median absolute deviation (MAD) threshold: three times the MAD above or below the median value.
This approach is suitable when the data exhibits a normal distribution.
For this, the entire distribution of the metric is used. The total number of reads/expressed features metrics are logarithmically transformed to achieve a normal distribution.
- Check the validity of the MAD threshold by ensuring that:
- It does not result in the removal of all barcodes.
- It is greater than a predefined target threshold: 100 for the total number of reads and expressed features and 1% for the percentage of reads mapped to spike-in control regions and features indicative of low quality.
- Use the MAD threshold if valid for all four metrics. Otherwise,
- Use the threshold calculated using the Otsu method [Otsu, 1979].
This approach is suitable when the data exhibits a bimodal distribution. Under the assumption that one of the modes originates from the low quality barcodes, the threshold optimally separates the low quality barcodes.
For this, only barcodes with a moderate number of reads are considered. Barcodes with total number of reads below the ambient threshold or above the knee are excluded. See Cell calling for the calculation of the ambient threshold and knee.
When calculating the threshold for the percentage of reads mapped to spike-in control regions and features indicative of low quality, within the barcodes with a moderate number of reads, only barcodes with moderate percentages (between 5% and 50%) are considered.