Structural Variant Caller algorithm
The tool is based on Sniffles2 v2.2. Results are therefore expected to be similar but not identical to those of Sniffles2 v2.2. For details on the algorithm, please refer to the Sniffles2 preprint https://www.biorxiv.org/content/10.1101/2022.04.04.487055v2.full.
The principal differences from Sniffles2 v2.2 are outlined below:
- Insertions contained within a primary alignment of a read are remapped to determine if they are duplications. This reduces the number of duplications that are erroneously reported as both an insertion and a duplication.
- Breakends that are close to an insertion of known length are not called when the longest read supporting the breakend is shorter than the shortest estimate of the insertion length. This is because it is likely that the inserted sequence is homologous to a region on another chromosome, and that the reads supporting the breakend are the reads that did not extend through the insertion.
- The full supplementary alignments for a read are always used, rather than the summary of the alignments provided by minimap2 in the SA SAM flag. Supplementary alignments describe the alignment of parts of a read that are not aligned in the primary alignment. This change is expected to give more precise structural variant locations and lengths in some cases.
- The consensus sequence of insertions is determined from all the reads supporting the insertion.
- Breakends are paired, and are only reported if the two locations supported by a read passing through the breakend are called. This leads to more interpretable breakends, but at the cost that rearrangements where only one breakend meets quality control cutoffs are not visible.