Structural Variant Caller for Long Reads settings
To run the Structural Variant Caller for Long Reads tool:
Tools | Resequencing Analysis () | Variant Detection (
) | Structural Variant Caller for Long Reads (
)
The tool accepts a single read mapping as input.
There are two calling modes, Germline and Somatic, which can be used in two applications, Whole genome sequencing and Targeted sequencing, as shown in figure 31.51.
Figure 31.1: The calling mode and application options for Structural Variant Caller for Long Reads.
The calling modes are:
- Germline. Recommended for diploid samples.
- Somatic. Recommended for calling on samples where variants may appear at low frequencies. When this is enabled:
- Filtering of variants based on coverage and read mapping orientation may differ from germline mode depending on the application - see below for details.
- Strict filtering of deletions and duplications that are longer than 50,000bp is disabled.
- Inversions and duplications are only reported if they are at least 500bp long.
- Variants with frequencies between 5% and 30% are filtered away if they are supported by alignments with many mismatches and indels compared to the reference.
- Variants with frequencies <5% are not reported.
The Targeted sequencing application requires two additional options to be set:
- Restrict calling to target regions. Enables selection of a feature track to limit calling to regions where reads are expected to map with high coverage. Variants will only be reported if they start or end within a target region, or at most 15bp away from a target region. In the case of breakend variants, only one breakend in a pair need be within the target region for both to be reported. This makes this setting suitable for the detection of unknown fusion partners using the FUDGE technique of [Stangl et al., 2020].
Several filters are applied to variants that depend on their coverage at upstream and downstream positions. If such a position lies outside a target region, then it is assumed that no coverage is expected at this position, and the filter is not applied.
In rare cases, the coverage of the variant itself cannot be determined, because the locations used to calculate it lie outside the target region. Such variants are reported as heterozygous with coverage '0'.
- Minimum supporting reads. Variants are filtered away if they have fewer countable reads than this. The 'countable' reads are those that are used by the variant detection tool when calling the variant and may be fewer in number than seen in the read mapping.
Targeted sequencing differs from Whole genome sequencing by how variants are filtered based on coverage. This is because coverage may vary along the target length and/or targets may have very different average coverages from each other:
- Germline - Whole genome sequencing. For each variant, the minimum number of supporting reads required for it to be reported is calculated based on its type (deletion, insertion etc.), the local coverage near the variant, and the average chromosome coverage.
- Somatic - Whole genome sequencing. Variants whose coverage changes by more than 10% when moving from upstream of the variant to the start position, or from the start position to the center of the variant, or from the center of the variant to the end position, or from the end position to downstream of the variant are filtered away.
- Germline - Targeted sequencing and Somatic - Targeted sequencing. Variants are filtered away if they have fewer supporting reads than specified by the 'Minimum supporting reads' parameter.
Targeted sequencing also differs from Whole genome sequencing by how variants are filtered based on read mapping orientation. This is because targeted protocols may generate reads in a particular orientation for each target.
- Germline - Whole genome sequencing. Breakend variants are filtered away if evidence for them is observed on reads mapping in only one orientation.
- Somatic - Whole genome sequencing. All variants other than long insertions are filtered away if evidence for them is observed on reads mapping in only one orientation.
- Germline - Targeted sequencing and Somatic - Targeted sequencing. Variants are never filtered away based on read mapping orientation.