Empty droplets filter
In droplet-based data, barcodes can correspond to droplets containing one cell, more cells or no cells at all. In the first dialog of QC for Single Cell, the Empty droplets filter can be enabled and customized to remove the droplets that are detected to not contain any cells. This filter should be skipped for single-cell protocols that are not droplet-based.
Note that each droplet is assigned one barcode and these terms can be used interchangeably for droplet-based protocols.
Non-zero counts in empty droplets are obtained from ambient (i.e., extracellular) RNA, that can be captured and sequenced during the protocol. Sequenced empty droplets contain significantly fewer reads, and this can be seen as a sharp transition in the rank plot, shown in figure 5.9.
Droplets can be classified in three categories (see figure 5.9):
- Droplets are cells because they have a high number of reads.
- Droplets are empty because they have a low number of reads.
- The remaining droplets with an intermediate number of reads could be either cells with low RNA content or empty droplets, and this cannot be determined purely based on the number of reads.
Droplets with a high number of reads are automatically retained as cell-containing droplets. The threshold for this is usually obtained from the automatically inferred knee from the rank plot (see figure 5.9 and The cell calling algorithm for details).
An ambient RNA profile is modeled from the droplets determined to be empty based on the low number of reads. To detect cells with low RNA content, the remaining droplets with an intermediate number of reads can be tested against this ambient RNA profile and are assigned simulation-based FDR-corrected p-values, from which non-empty droplets are identified.
Figure 5.5: The default settings in the Empty droplets filter dialog.
The following options can be adjusted in the Empty droplets filter dialog (figure 5.5):
- Identify and remove empty droplets. Enables filtering of the empty droplets. This should be disabled for single-cell protocols that are not droplet-based.
- Maximum number of reads for empty droplets. Droplets with at most this many total number of reads are considered empty droplets that only contain ambient RNA and are used for modeling the ambient RNA profile.
- Specify minimum number of reads for droplets to be cells. Droplets with a total number of reads above the knee are retained as cell-containing droplets. Enabling this option allows specifying a manual threshold to be used instead of the knee.
- Minimum. The minimum number of reads to be used instead of the knee.
- Identify cells from the remaining droplets. Enables the simulation-based detection of cells with low RNA content.
- FDR threshold. Droplets with FDR-corrected p-values larger than this are removed as empty droplets.
- Number of simulations. The number of simulations performed for estimating p-values.
After applying the Empty droplets filter, only droplets that are identified as non-empty are retained for the remaining filters (see Count-based and extra-chromosomal filters). Note that this filter does not concern the quality of the retained cells. The Empty droplets filter already removes cells with low number of reads, or, by association, low number of expressed features, and enabling the Count-based filters is not strictly necessary. The Extra-chromosomal filters provide the most additional benefit in this situation.
For removing droplets containing more than one cell, the Doublets filter can be used, see Doublets filter.