Deduplication

Deduplication can be used to collapse reads that are identical or almost identical copies and likely represent the same original DNA fragment.

Reads are deduplicated through the following steps:

  1. Read pairs that are mapped in the same intervals are clustered.
  2. Reads pairs that are identical or almost identical in the same cluster are considered duplicates.
  3. For each group of duplicate reads, a consensus sequence is calculated.
  4. The duplicate read pairs are replaced with the consensus sequence.