Structural Variant Caller for Long Reads output

The tool has the following output options, as shown in figure 31.52:

Image long_structural_variant_setting2
Figure 31.2: The output options for Structural Variant Caller for Long Reads.

All outputs (other than the Report) can be exported together to a single VCF file, see Export in VCF format. The VCF-exportable outputs contain the following annotations:

Additional annotations present on more than one output are:

Indels variant track

The indels track uses many of the standard variant annotations, see Variant tracks.

Long indels variant track

This track contains insertions and deletions larger than 100,000 bp. It is often enriched for false positive calls. This is because deletions and duplications are called when a read maps to two disjoint locations. If these two locations are at either end of a chromosome, then a near-whole chromosome deletion (or duplication) will be called. In many cases, it is more likely that the read maps to two places because an insertion is present that shares homology with one of the locations.

It is sometimes possible to detect false positives. For example, if the sample is germline, and other structural variants are called within a long homozygous deletion, then it is likely that the long homozygous deletion is a false positive.

Inversions track

This track is often enriched for false positive calls. This is because an inversion may be called when a read maps to two disjoint locations on the same chromosome and in different orientations. If these two locations are at either end of a chromosome, then a near-whole chromosome inversion will be called. In many cases, it is more likely that the read maps to two places because an insertion is present that shares homology with one of the locations.

When coverage is high, it is often possible to detect false positives by requiring that there is support for both sides of the inversion. Reads supporting the 5' side of the inversion on the reference are counted as "forward" reads, and reads supporting the 3' side of the inversion on the reference are counted as "reverse" reads. Each variant reports these in "Forward read count" and "Reverse read count" annotations respectively.

Another class of false positives are inversions that start or end at the same location as an insertion. This is sometimes a signature of an inverted repeat.

Breakend track

The breakend track can be used to look for translocations and other complex rearrangements that involve more than one chromosome. The definition of a breakend that we use here closely follows that from the VCF specification. Please refer to Section 5.4 "Specifying complex rearrangements with breakends" of https://samtools.github.io/hts-specs/VCFv4.4.pdf. Specifically we support the cases shown in figures 1, 4, 5, and 7 of that section.

Annotations that are only present on the breakends output are:

A simple reciprocal translocation involves 4 breakends: an acceptor and donor on each of the two chromosomes involved in the translocation. The 4 breakends will have different combinations of Name and Type, and two different Fusion numbers.

The easiest way to find translocations is to:

Note that the Region for a breakend is sometimes on the plus strand (e.g. 123456^123457) and sometimes on the minus strand (e.g. complement(123456^1234567)). There is no significance to the reported strand - it is used by the VCF exporter.