Barcode correction
Sequencing errors and low-quality bases can result in one cell barcode being represented as distinct barcodes in the data, with:
- Ambiguous nucleotides.
- Incorrectly sequenced nucleotides, e.g. an 'A' in the barcode that was sequenced as a 'C'.
Such errors can be corrected in each barcode component independently using the options in the Barcode correction dialog:
- No correction. Do not correct the barcodes.
- Use sample. Correct barcodes to other barcodes found in the sample.
A barcode 'A' is corrected to a barcode 'B' without ambiguous nucleotides if they differ at positions where 'A' either has:
- A non-ambiguous nucleotide. Only one difference is allowed. 'B' must have at least 4 times as many UMIs/reads as 'A'.
- Ambiguous nucleotides.
- Use whitelist. Correct barcodes not on the whitelist to other barcodes found in the sample and on the whitelist.
A barcode 'A' is corrected to a barcode 'B' if they differ at positions where 'A' either has:
- A non-ambiguous nucleotide. Only one difference is allowed.
- Ambiguous nucleotides.
The whitelist file must contain one cell barcode per line, arranged alphabetically.
If one barcode can be corrected to several barcodes, the one with the highest number of UMIs/reads is used.
Several of the predefined read structures under Library preparation have inbuilt whitelists. For these, barcode correction is enforced.