Deduplication
Deduplication can be used to collapse reads that are identical or almost identical copies and likely represent the same original DNA fragment.
Reads are deduplicated through the following steps:
- Read pairs that are mapped in the same intervals are clustered.
- Reads pairs that are identical or almost identical in the same cluster are considered duplicates.
- For each group of duplicate reads, a consensus sequence is calculated.
- The duplicate read pairs are replaced with the consensus sequence.